Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object. This provides no
functional change, as it is essentially a macro substitution.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
struct object is one of the major data structures dealing with object
IDs. Convert it to use struct object_id instead of an unsigned char
array. Convert get_object_hash to refer to the new member as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash. Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Instead of open-coding the function pop_commit() just call it. This
makes the intent clearer and reduces code size.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we already know the length of a string (e.g., because
we just malloc'd to fit it), it's nicer to use memcpy than
strcpy, as it makes it more obvious that we are not going to
overflow the buffer (because the size we pass matches the
size in the allocation).
This also eliminates calls to strcpy, which make auditing
the code base harder.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you init or clone an empty repository, the initial
message from running "git log" is not very friendly:
$ git init
Initialized empty Git repository in /home/peff/foo/.git/
$ git log
fatal: bad default revision 'HEAD'
Let's detect this situation and write a more friendly
message:
$ git log
fatal: your current branch 'master' does not have any commits yet
We also detect the case that 'HEAD' points to a broken ref;
this should be even less common, but is easy to see. Note
that we do not diagnose all possible cases. We rely on
resolve_ref, which means we do not get information about
complex cases. E.g., "--default master" would use dwim_ref
to find "refs/heads/master", but we notice only that
"master" does not exist. Similarly, a complex sha1
expression like "--default HEAD^2" will not resolve as a
ref.
But that's OK. We fall back to a generic error message in
those cases, and they are unlikely to be used anyway.
Catching an empty or broken "HEAD" improves the common case,
and the other cases are not regressed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We create a file BISECT_TERMS in the repository .git to be read during a
bisection. There's no user-interface yet, but "git bisect" works if terms
other than old/new or bad/good are set in .git/BISECT_TERMS. The
fonctions to be changed if we add new terms are quite few.
In git-bisect.sh:
check_and_set_terms
bisect_voc
Co-authored-by: Louis Stuber <stuberl@ensimag.grenoble-inp.fr>
Tweaked-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Antoine Delaite <antoine.delaite@ensimag.grenoble-inp.fr>
Signed-off-by: Louis Stuber <stuberl@ensimag.grenoble-inp.fr>
Signed-off-by: Valentin Duperray <Valentin.Duperray@ensimag.imag.fr>
Signed-off-by: Franck Jonas <Franck.Jonas@ensimag.imag.fr>
Signed-off-by: Lucien Kong <Lucien.Kong@ensimag.imag.fr>
Signed-off-by: Thomas Nguy <Thomas.Nguy@ensimag.imag.fr>
Signed-off-by: Huynh Khoi Nguyen Nguyen <Huynh-Khoi-Nguyen.Nguyen@ensimag.imag.fr>
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding date modes that may carry extra
information beyond the mode itself, this patch converts the
date_mode enum into a struct.
Most of the conversion is fairly straightforward; we pass
the struct as a pointer and dereference the type field where
necessary. Locations that declare a date_mode can use a "{}"
constructor. However, the tricky case is where we use the
enum labels as constants, like:
show_date(t, tz, DATE_NORMAL);
Ideally we could say:
show_date(t, tz, &{ DATE_NORMAL });
but of course C does not allow that. Likewise, we cannot
cast the constant to a struct, because we need to pass an
actual address. Our options are basically:
1. Manually add a "struct date_mode d = { DATE_NORMAL }"
definition to each caller, and pass "&d". This makes
the callers uglier, because they sometimes do not even
have their own scope (e.g., they are inside a switch
statement).
2. Provide a pre-made global "date_normal" struct that can
be passed by address. We'd also need "date_rfc2822",
"date_iso8601", and so forth. But at least the ugliness
is defined in one place.
3. Provide a wrapper that generates the correct struct on
the fly. The big downside is that we end up pointing to
a single global, which makes our wrapper non-reentrant.
But show_date is already not reentrant, so it does not
matter.
This patch implements 3, along with a minor macro to keep
the size of the callers sane.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function is called only from one place, which makes sure to have
`interesting_cache` not NULL. Additionally the variable is a
dereferenced a few lines before unconditionally, which would have
resulted in a segmentation fault before hitting this check.
Signed-off-by: Stefan Beller <sbeller@google.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are traversing commit parents along the
UNINTERESTING side of a revision walk, we do not care if
the parent turns out to be missing. That lets us limit
traversals using unreachable and possibly incomplete
sections of history. However, we do still print error
messages about the missing commits; this patch suppresses
the error, as well.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We set revs->ignore_missing_links to instruct the
revision-walking machinery that we know the history graph
may be incomplete. For example, we use it when walking
unreachable but recent objects; we want to add what we can,
but it's OK if the history is incomplete.
However, we still print error messages for the missing
objects, which can be confusing. This is not an error, but
just a normal situation when transitioning from a repository
last pruned by an older git (which can leave broken segments
of history) to a more recent one (where we try to preserve
whole reachable segments).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change typedef each_ref_fn to take a "const struct object_id *oid"
parameter instead of "const unsigned char *sha1".
To aid this transition, implement an adapter that can be used to wrap
old-style functions matching the old typedef, which is now called
"each_ref_sha1_fn"), and make such functions callable via the new
interface. This requires the old function and its cb_data to be
wrapped in a "struct each_ref_fn_sha1_adapter", and that object to be
used as the cb_data for an adapter function, each_ref_fn_adapter().
This is an enormous diff, but most of it consists of simple,
mechanical changes to the sites that call any of the "for_each_ref"
family of functions. Subsequent to this change, the call sites can be
rewritten one by one to use the new interface.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are limiting a rev-list traversal due to
UNINTERESTING refs, we have to walk down the tips (both
interesting and uninteresting) to find where they intersect.
We keep a queue of commits to examine, pop commits off
the queue one by one, and potentially add their parents. The
size of the queue will naturally fluctuate based on the
"width" of the history graph; i.e., the number of
simultaneous lines of development. But for the most part it
will stay in the same ballpark as the initial number of tips
we fed, shrinking over time (as we hit common ancestors of
the tips). So roughly speaking, if we start with `N` tips,
we'll spend much of the time with a queue around `N` items.
For each UNINTERESTING commit we pop, we call
still_interesting to check whether marking its parents as
UNINTERESTING has made the whole queue uninteresting (in
which case we can quit early). Because the queue is stored
as a linked list, this is `O(N)`, where `N` is the number of
items in the queue. So processing a queue with `N` commits
marked UNINTERESTING (and one or more interesting commits)
will take `O(N^2)`.
If you feed a lot of positive tips, this isn't a problem.
They aren't UNINTERESTING, so they don't incur the
still_interesting check. It also isn't a problem if you
traverse from an interesting tip to some UNINTERESTING
bases. We order the queue by recency, so the interesting
commits stay at the front of the queue as we walk down them.
The linear check can exit early as soon as it sees one
interesting commit left in the queue.
But if you want to know whether an older commit is reachable
from a set of newer tips, we end up processing in the
opposite direction: from the UNINTERESTING ones down to the
interesting one. This may happen when we call:
git rev-list $commits --not --all
in check_everything_connected after a fetch. If we fetched
something much older than most of our refs, and if we have a
large number of refs, the traversal cost is dominated by the
quadratic behavior.
These commands simulate the connectivity check of such a
fetch, when you have `$n` distinct refs in the receiver:
# positive ref is 100,000 commits deep
git rev-list --all | head -100000 | tail -1 >input
# huge number of more recent negative refs
git rev-list --all | head -$n | sed s/^/^/ >>input
time git rev-list --stdin <input
Here are timings for various `n` on the linux.git
repository. The `n=1` case provides a baseline for just
walking the commits, which lets us see the still_interesting
overhead. The times marked with `+` subtract that baseline
to show just the extra time growth due to the large number
of refs. The `x` numbers show the slowdown of the adjusted
time versus the prior trial.
n | before | after
--------------------------------------------------------
1 | 0.991s | 0.848s
10000 | 1.120s (+0.129s) | 0.885s (+0.037s)
20000 | 1.451s (+0.460s, 3.5x) | 0.923s (+0.075s, 2.0x)
40000 | 2.731s (+1.740s, 3.8x) | 0.994s (+0.146s, 1.9x)
80000 | 8.235s (+7.244s, 4.2x) | 1.123s (+0.275s, 1.9x)
Each trial doubles `n`, so you can see the quadratic (`4x`)
behavior before this patch. Afterwards, we have a roughly
linear relationship.
The implementation is fairly straightforward. Whenever we do
the linear search, we cache the interesting commit we find,
and next time check it before doing another linear search.
If that commit is removed from the list or becomes
UNINTERESTING itself, then we fall back to the linear
search. This is very similar to the trick used by fce87ae
(Fix quadratic performance in rewrite_one., 2008-07-12).
I considered and rejected several possible alternatives:
1. Keep a count of UNINTERESTING commits in the queue.
This requires managing the count not only when removing
an item from the queue, but also when marking an item
as UNINTERESTING. That requires touching the other
functions which mark commits, and would require knowing
quickly which commits are in the queue (lookup in the
queue is linear, so we would need an auxiliary
structure or to also maintain an IN_QUEUE flag in each
commit object).
2. Keep a separate list of interesting commits. Drop items
from it when they are dropped from the queue, or if
they become UNINTERESTING. This again suffers from
extra complexity to maintain the list, not to mention
CPU and memory.
3. Use a better data structure for the queue. This is
something that could help the fix in fce87ae, because
we order the queue by recency, and it is about
inserting quickly in recency order. So a normal
priority queue would help there. But here, we cannot
disturb the order of the queue, which makes things
harder. We really do need an auxiliary index to track
the flag we care about, which is basically option (2)
above.
The "cache" trick is simple, and the numbers above show that
it works well in practice. This is because the length of
time it takes to find an interesting commit is proportional
to the length of time it will remain cached (i.e., if we
have to walk a long way to find it, it also means we have to
pop a lot of elements in the queue until we get rid of it
and have to find another interesting commit).
The worst case is still quadratic, though. We could have `N`
uninteresting commits at the front of the queue, followed by
`N` interesting commits, where commit `i` has parent `i+N`.
When we pop commit `i`, we will notice that the parent of
the next commit, `i+1+N` is still interesting and cache it.
But then handling commit `i+1`, we will mark its parent
`i+1+N` uninteresting, and immediately invalidate our cache.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rev-list --bisect is used by git bisect, but never together with
--first-parent. Because rev-list --bisect together with --first-parent
is not handled currently, and even leads to segfaults, refuse to use
both options together.
Because this is not supported, it makes little sense to use git log
--bisect --first parent either, because refs/heads/bad is not limited to
the first parent chain.
Helped-by: Junio C. Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because "--graph" is about connected history while --no-walk is
about discrete points, it does not make sense to allow these two
options at the same time. [1]
This change makes a few calls to "show --graph" fail in t4052, but
asking to show one commit with graph is a nonsensical thing to do.
Thus, tests on "show --graph" in t4052 have been removed [2,3].
Same tests on "show" without --graph option have already been tested
in 4052.
3 testcases have been added to test this patch.
[1]: http://article.gmane.org/gmane.comp.version-control.git/216083
[2]: http://article.gmane.org/gmane.comp.version-control.git/264950
[3]: http://article.gmane.org/gmane.comp.version-control.git/265107
Helped-By: Eric Sunshine <sunshine@sunshineco.com>
Helped-By: René Scharfe <l.s.r@web.de>
Helped-By: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log --grep=<string>" shows only commits with messages that
match the given string, but sometimes it is useful to be able to
show only commits that do *not* have certain messages (e.g. "show
me ones that are not FIXUP commits").
Originally, we had the invert-grep flag in grep_opt, but because
"git grep --invert-grep" does not make sense except in conjunction
with "--files-with-matches", which is already covered by
"--files-without-matches", it was moved it to revisions structure.
To have the flag there expresses the function to the feature better.
When the newly inserted two tests run, the history would have commits
with messages "initial", "second", "third", "fourth", "fifth", "sixth"
and "Second", committed in this order. The commits that does not match
either "th" or "Sec" is "second" and "initial". For the case insensitive
case only "initial" matches.
Signed-off-by: Christoph Junghans <ottxor@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit fbd4a70 (list-objects: mark more commits as edges in
mark_edges_uninteresting - 2013-08-16), we marked an increasing number
of edges uninteresting. This change, and the subsequent change to make
this conditional on --objects-edge, are used by --thin to make much
smaller packs for shallow clones.
Unfortunately, they cause a significant performance regression when
pushing non-shallow clones with lots of refs (23.322 seconds vs.
4.785 seconds with 22400 refs). Add an option to git rev-list,
--objects-edge-aggressive, that preserves this more aggressive behavior,
while leaving --objects-edge to provide more performant behavior.
Preserve the current behavior for the moment by using the aggressive
option.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callers of get_merge_bases() can choose to leave object flags
used during the merge-base traversal by passing cleanup=0 as a
parameter, but in practice a very few callers can afford to do so
(namely, "git merge-base"), as they need to compute merge base in
preparation for other processing of their own and they need to see
the object without contaminate flags.
Change the function signature of get_merge_bases_many() and
get_merge_bases() to drop the cleanup parameter, so that the
majority of the callers do not have to say ", 1" at the end.
Give a new get_merge_bases_many_dirty() API to support only a few
callers that know they do not need to spend cycles cleaning up the
object flags.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is currently no easy way to ask the revision traversal
machinery to include objects reachable from the index (e.g.,
blobs and trees that have not yet been committed). This
patch adds an option to do so.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call traverse_commit_list, we may have trees and
blobs in the pending array. As we process these, we pass the
"name" field from the pending entry as the path of the
object within the tree (which then becomes the root path if
we recurse into subtrees).
When we set up the traversal in prepare_revision_walk,
though, the "name" field of any pending trees and blobs is
likely to be the ref at which we found the object. We would
not want to make this part of the path (e.g., doing so would
make "git rev-list --objects v2.6.11-tree" in linux.git show
paths like "v2.6.11-tree/Makefile", which is nonsensical).
Therefore prepare_revision_walk sets the name field of each
pending tree and blobs to the empty string.
However, this leaves no room for a caller who does know the
correct path of a pending object to propagate that
information to the revision walker. We can fix this by
making two related changes:
1. Use the "path" field as the path instead of the "name"
field in traverse_commit_list. If the path is not set,
default to "" (which is what we always ended up with in
the current code, because of prepare_revision_walk).
2. In prepare_revision_walk, make a complete copy of the
entry. This makes the path field available to the
walker (if there is one), solving our problem.
Leaving the name field intact is now OK, as we do not
use it as a path due to point (1) above (and we can use
it to make more meaningful error messages if we want).
We also make the original "mode" field available to the
walker, though it does not actually use it.
Note that we still re-add the pending objects and free the
old ones (so we may strdup the path and name only to free
the old ones). This could be made more efficient by simply
copying the object_array entries that we are keeping.
However, that would require more restructuring of the code,
and is not done here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to add all reflog entries as tips for finding
reachable objects. The revision machinery can already do
this (to support "rev-list --reflog"); we can reuse that
code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we enter prepare_revision_walk, we have zero or more
entries in our "pending" array. We disconnect that array
from the rev_info, and then process each entry:
1. If the entry is a commit and the --source option is in
effect, we keep a pointer to the object name.
2. Otherwise, we re-add the item to the pending list with
a blank name.
We then throw away the old array by freeing the array
itself, but do not touch the "name" field of each entry. For
any items of type (2), we leak the memory associated with
the name. This commit fixes that by calling object_array_clear,
which handles the cleanup for us.
That breaks (1), though, because it depends on the memory
pointed to by the name to last forever. We can solve that by
making a copy of the name. This is slightly less efficient,
but it shouldn't matter in practice, as we do it only for
the tip commits of the traversal.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous commit, we made add_name_decoration global
so that adders would not have to access the hash directly.
We now make the hash itself static so that callers _have_ to
add through our function, making sure that all additions go
through a single point. To do this, we have to add one more
accessor function: a way to lookup entries in the hash.
Since the only caller doesn't actually look at the returned
value, but rather only asks whether there is a decoration or
not, we could provide only a boolean "has_name_decoration".
That would allow us to make "struct name_decoration" local
to log-tree, as well.
However, it's unlikely to cause any maintainability harm
making the actual data public, and this interface is more
flexible if we need to look at decorations from other parts
of the code in the future.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once upon a time, we parsed pretty options by looking for
"--pretty" at the start of the string, and then feeding the
rest (including an "=") to get_commit_format. Later, commit
48ded91 (log --pretty: do not accept bogus "--prettyshort",
2008-05-25) split this into a separate check for "--pretty"
versus "--pretty=".
However, when parsing "--pretty", we still passed "arg+8" to
get_commit_format. This is useless, since it will always
point to the NUL terminator at the end of the string. We can
simply pass NULL instead; both parameters are treated the
same by get_commit_format.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like the callsites in the previous commit, logmsg_reencode
already falls back to read_sha1_file when necessary.
However, I split its conversion out into its own commit
because it's a bit more complex.
We return either:
1. The original commit->buffer
2. A newly allocated buffer from read_sha1_file
3. A reencoded buffer (based on either 1 or 2 above).
while trying to do as few extra reads/allocations as
possible. Callers currently free the result with
logmsg_free, but we can simplify this by pointing them
straight to unuse_commit_buffer. This is a slight layering
violation, in that we may be passing a buffer from (3).
However, since the end result is to free() anything except
(1), which is unlikely to change, and because this makes the
interface much simpler, it's a reasonable bending of the
rules.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The return value from logmsg_reencode may be either a newly
allocated buffer or a pointer to the existing commit->buffer.
We would not want the caller to accidentally free() or
modify the latter, so let's mark it as const. We can cast
away the constness in logmsg_free, but only once we have
determined that it is a free-able buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This mistyped command line simply ignores "master" and ends up
showing two commits from the current HEAD:
$ git log -2master
because we feed "2master" to atoi() without making sure that the
whole string is parsed as an integer.
Use the strtol_i() helper function instead.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These two commands are supposed to be equivalent:
$ git log --exclude=refs/notes/\* --all --no-merges --since=2.days |
git shortlog
$ git shortlog --exclude=refs/notes/\* --all --no-merges --since=2.days
However, the latter does not understand the ref-exclusion command
line option, even though other options understood by "log", such as
"--all" and "--no-merges", are understood.
This was because e7b432c5 (revision: introduce --exclude=<glob> to
tame wildcards, 2013-08-30) did not wire the new option fully to the
machinery. A new option understood by handle_revision_pseudo_opt()
must be told to handle_revision_opt() as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pack-objects is computing the reachability bitmap to
serve a fetch request, it can erroneously die() if some of
the UNINTERESTING objects are not present. Upload-pack
throws away HAVE lines from the client for objects we do not
have, but we may have a tip object without all of its
ancestors (e.g., if the tip is no longer reachable and was
new enough to survive a `git prune`, but some of its
reachable objects did get pruned).
In the non-bitmap case, we do a revision walk with the HAVE
objects marked as UNINTERESTING. The revision walker
explicitly ignores errors in accessing UNINTERESTING commits
to handle this case (and we do not bother looking at
UNINTERESTING trees or blobs at all).
When we have bitmaps, however, the process is quite
different. The bitmap index for a pack-objects run is
calculated in two separate steps:
First, we perform an extensive walk from all the HAVEs to
find the full set of objects reachable from them. This walk
is usually optimized away because we are expected to hit an
object with a bitmap during the traversal, which allows us
to terminate early.
Secondly, we perform an extensive walk from all the WANTs,
which usually also terminates early because we hit a commit
with an existing bitmap.
Once we have the resulting bitmaps from the two walks, we
AND-NOT them together to obtain the resulting set of objects
we need to pack.
When we are walking the HAVE objects, the revision walker
does not know that we are walking it only to mark the
results as uninteresting. We strip out the UNINTERESTING flag,
because those objects _are_ interesting to us during the
first walk. We want to keep going to get a complete set of
reachable objects if we can.
We need some way to tell the revision walker that it's OK to
silently truncate the HAVE walk, just like it does for the
UNINTERESTING case. This patch introduces a new
`ignore_missing_links` flag to the `rev_info` struct, which
we set only for the HAVE walk.
It also adds tests to cover UNINTERESTING objects missing
from several positions: a missing blob, a missing tree, and
a missing parent commit. The missing blob already worked (as
we do not care about its contents at all), but the other two
cases caused us to die().
Note that there are a few cases we do not need to test:
1. We do not need to test a missing tree, with the blob
still present. Without the tree that refers to it, we
would not know that the blob is relevant to our walk.
2. We do not need to test a tip commit that is missing.
Upload-pack omits these for us (and in fact, we
complain even in the non-bitmap case if it fails to do
so).
Reported-by: Siddharth Agarwal <sid0@fb.com>
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Option explanation is in rev-list-options.txt. The interaction with -z
is left undecided.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the "rev-list" analogue to 25fba78 (cat-file:
disable object/refname ambiguity check for batch mode,
2013-07-12). Like cat-file, "rev-list --stdin" may read a
large number of sha1 object names, and the warning check
introduces a significant slow-down.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps reduce the number of match_pathspec_depth() call sites and
show how match_pathspec_depth() is used.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make it clear that we don't use fnmatch() anymore.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since diff_tree_sha1() can now accept empty trees via NULL sha1, we
could just call it without manually reading trees into tree_desc and
duplicating code.
Besides, that
if (!tree)
return 0;
looked suspect - we were saying an invalid tree != empty tree, but maybe it is
better to just say the tree is invalid here, which is what diff_tree_sha1()
does for such case.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the previous fix 895c5ba3 (revision: do not peel tags used in
range notation, 2013-09-19), handle_revision_arg() that processes
command line arguments for the "git log" family of commands no
longer directly places the object pointed by the tag in the pending
object array when it sees a tag object. We used to place pointee
there after copying the flag bits like UNINTERESTING and
SYMMETRIC_LEFT.
This change meant that any flag that is relevant to later history
traversal must now be propagated to the pointed objects (most often
these are commits) while starting the traversal, which is partly
done by handle_commit() that is called from prepare_revision_walk().
We did propagate UNINTERESTING, but did not do so for others, most
notably SYMMETRIC_LEFT. This caused "git log --left-right v1.0..."
(where "v1.0" is a tag) to start losing the "leftness" from the
commit the tag points at.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rev-list --objects ^A^{tree} B^{tree}" ought to mean "I want a
list of objects inside B's tree, but please exclude the objects that
appear inside A's tree".
we see the top-level tree marked as uninteresting (i.e. ^A^{tree} in
the above example) and call mark_tree_uninteresting() on it; this
unfortunately prevents us from recursing into the tree and marking
the objects in the tree as uninteresting.
The reason why "git log ^A A" yields an empty set of commits,
i.e. we do not have a similar issue for commits, is because we call
mark_parents_uninteresting() after seeing an uninteresting commit.
The uninteresting-ness of the commit itself does not prevent its
parents from being marked as uninteresting.
Introduce mark_tree_contents_uninteresting() and structure the code
in handle_commit() in such a way that it makes it the responsibility
of the callchain leading to this function to mark commits, trees and
blobs as uninteresting, and also make it the responsibility of the
helpers called from this function to mark objects that are reachable
from them.
Note that this is a very old bug that probably dates back to the day
when "rev-list --objects" was introduced. The line to clear
tree->object.parsed at the end of mark_tree_contents_uninteresting()
can be removed when this fix is merged to the codebase after
6e454b9a (clear parsed flag when we free tree buffers, 2013-06-05).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Normally parse_pathspec() is used on command line arguments where it
can do fancy thing like parsing magic on each argument or adding magic
for all pathspecs based on --*-pathspecs options.
There's another use of parse_pathspec(), where pathspec is needed, but
the input is known to be pure paths. In this case we usually don't
want --*-pathspecs to interfere. And we definitely do not want to
parse magic in these paths, regardless of --literal-pathspecs.
Add new flag PATHSPEC_LITERAL_PATH for this purpose. When it's set,
--*-pathspecs are ignored, no magic is parsed. And if the caller
allows PATHSPEC_LITERAL (i.e. the next calls can take literal magic),
then PATHSPEC_LITERAL will be set.
This fixes cases where git chokes when GIT_*_PATHSPECS are set because
parse_pathspec() indicates it won't take any magic. But
GIT_*_PATHSPECS add them anyway. These are
export GIT_LITERAL_PATHSPECS=1
git blame -- something
git log --follow something
git log --merge
"git ls-files --with-tree=path" (aka parse_pathspec() in
overlay_tree_on_cache()) is safe because the input is empty, and
producing one pathspec due to PATHSPEC_PREFER_CWD does not take any
magic into account.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit enables users of `struct rev_info` to peform custom limiting
during a revision walk (i.e. `get_revision`).
If the field `include_check` has been set to a callback, this callback
will be issued once for each commit before it is added to the "pending"
list of the revwalk. If the include check returns 0, the commit will be
marked as added but won't be pushed to the pending list, effectively
limiting the walk.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parse_pathspec() is called with no paths, the behavior could be
either return no paths, or return one path that is cwd. Some commands
do the former, some the latter. parse_pathspec() itself does not make
either the default and requires the caller to specify either flag if
it may run into this situation.
I've grep'd through all parse_pathspec() call sites. Some pass
neither, but those are guaranteed never pass empty path to
parse_pathspec(). There are two call sites that may pass empty path
and are fixed with this patch.
[jc: added a test from Antoine's bug report]
Reported-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A range notation "A..B" means exactly the same thing as what "^A B"
means, i.e. the set of commits that are reachable from B but not
from A. But the internal representation after the revision parser
parsed these two notations are subtly different.
- "rev-list ^A B" leaves A and B in the revs->pending.objects[]
array, with the former marked as UNINTERESTING and the revision
traversal machinery propagates the mark to underlying commit
objects A^0 and B^0.
- "rev-list A..B" peels tags and leaves A^0 (marked as
UNINTERESTING) and B^0 in revs->pending.objects[] array before
the traversal machinery kicks in.
This difference usually does not matter, but starts to matter when
the --objects option is used. For example, we see this:
$ git rev-list --objects v1.8.4^1..v1.8.4 | grep $(git rev-parse v1.8.4)
$ git rev-list --objects v1.8.4 ^v1.8.4^1 | grep $(git rev-parse v1.8.4)
04f013dc38 v1.8.4
With the former invocation, the revision traversal machinery never
hears about the tag v1.8.4 (it only sees the result of peeling it,
i.e. the commit v1.8.4^0), and the tag itself does not appear in the
output. The latter does send the tag object itself to the output.
Make the range notation keep the unpeeled objects and feed them to
the traversal machinery to fix this inconsistency.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is useful to make sure we don't step outside the boundaries of what
we are interpreting at the moment. For example while interpreting
foobar@{u}~1, the job of interpret_branch_name() ends right before ~1,
but there's no way to figure that out inside the function, unless the
len argument is passed.
So let's do that.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
People often find "git log --branches" etc. that includes _all_
branches is cumbersome to use when they want to grab most but except
some. The same applies to --tags, --all and --glob.
Teach the revision machinery to remember patterns, and then upon the
next such a globbing option, exclude those that match the pattern.
With this, I can view only my integration branches (e.g. maint,
master, etc.) without topic branches, which are named after two
letters from primary authors' names, slash and topic name.
git rev-list --no-walk --exclude=??/* --branches |
git name-rev --refs refs/heads/* --stdin
This one shows things reachable from local and remote branches that
have not been merged to the integration branches.
git log --remotes --branches --not --exclude=??/* --branches
It may be a bit rough around the edges, in that the pattern to give
the exclude option depends on what globbing option follows. In
these examples, the pattern "??/*" is used, not "refs/heads/??/*",
because the globbing option that follows the -"-exclude=<pattern>"
is "--branches". As each use of globbing option resets previously
set "--exclude", this may not be such a bad thing, though.
Signed-off-by: Junio C Hamano <gitster@pobox.com>