git_commit_non_empty_tree is added to the functions that can be run from
commit filters. Its effect is to commit only commits actually touching the
tree and that are not merge points either.
The option --prune-empty is added. It defaults the commit-filter to
'git_commit_non_empty_tree "$@"', and can be used with any other
combination of filters, except --commit-hook that must used
'git_commit_non_empty_tree "$@"' where one puts 'git commit-tree "$@"'
usually to achieve the same result.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tag rewriting code used a 'sed' expression to substitute the new tag
name into the corresponding field of the annotated tag object. But this is
problematic if the tag name contains special characters. In particular,
if the tag name contained a slash, then the 'sed' expression had a syntax
error. We now protect against this by using 'printf' to assemble the
tag header.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use rev-list --simplify-merges everywhere. This changes the behaviour
of --subdirectory-filter in cases such as
O -- A -\
\ \
\- B -- M
where A and B bring the same changes to the subdirectory: It now keeps
both sides of the merge. Previously, the history would have been
simplified to 'O -- A'. Merges of unrelated side histories that never
touch the subdirectory are still removed.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous ancestor discovery code failed on any refs that are
(pre-rewrite) ancestors of commits marked for rewriting. This means
that in a situation
A -- B(topic) -- C(master)
where B is dropped by --subdirectory-filter pruning, the 'topic' was
not moved up to A as intended, but left unrewritten because we asked
about 'git rev-list ^master topic', which does not return anything.
Instead, we use the straightforward
git rev-list -1 $ref -- $filter_subdir
to find the right ancestor. To justify this, note that the nearest
ancestor is unique: We use the output of
git rev-list --parents -- $filter_subdir
to rewrite commits in the first pass, before any ref rewriting. If B
is a non-merge commit, the only candidate is its parent. If it is a
merge, there are two cases:
- All sides of the merge bring the same subdirectory contents. Then
rev-list already pruned away the merge in favour for just one of its
parents, so there is only one candidate.
- Some merge sides, or the merge outcome, differ. Then the merge is
not pruned and can be rewritten directly.
So it is always safe to use rev-list -1.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, git-filter-branch failed if it attempted to update an
annotated tag. Now we ignore this condition if --tag-name-filter is
given, so that we can later rewrite the tag. If no such option was
provided, we warn the user that he might want to run with
"--tag-name-filter cat" to achieve the intended effect.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 46eb449c restricted git-filter-branch to non-bare repositories
unnecessarily; git-filter-branch can work on bare repositories just
fine.
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit cfabd6eee1. I had
implemented it without understanding what --full-history does. Consider
this history:
C--M--N
/ / /
A--B /
\ /
D-/
where B and C modify a path, X, in the same way so that the result is
identical, and D does not modify it at all. With the path limiter X and
without --full-history this is simplified to
A--B
i.e. only one of the paths via B or C is chosen. I had assumed that
--full-history would keep both paths like this
C--M
/ /
A--B
removing the path via D; but in fact it keeps the entire history.
Currently, git does not have the capability to simplify to this
intermediary case. However, the other extreme to keep the entire history
is not wanted either in usual cases. I think we can expect that histories
like the above are rare, and in the usual cases we want a simplified
history. So let's remove --full-history again.
(Concerning t7003, subsequent tests depend on what the test case sets up,
so we can't just back out the entire test case.)
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
filter-branch tries to restore "old" copies of some
environment variables by using the construct:
unset var
test -z "$old_var" || var="$old_var" && export var
This is just wrong. AND-list and OR-list operators && and || have equal
precedence and they bind left to right. The second term, var="$old"
assignment always succeeds, so we always end up exporting var.
On bash and dash, exporting an unset variable has no effect. However, on
some shells (such as FreeBSD's /bin/sh), the shell exports the empty
value.
This manifested itself in this case as git-filter-branch setting
GIT_INDEX_FILE to the empty string, which in turn caused its call to
git-read-tree to fail, leaving the working tree pointing at the original
HEAD instead of the rewritten one.
To fix this, we change the short-circuit logic to better match the intent:
test -z "$old_var" || {
var="$old_var" && export var
}
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit d89c1df (filter-branch: don't use xargs -0, 2008-03-12) replaced a
'ls-files | xargs rm' pipeline by 'git clean'. 'git clean' however does
not recurse and remove directories by default.
Now, consider a tree-filter that renames a directory.
1. For the first commit everything works as expected
2. Then filter-branch checks out the files for the next commit. This
leaves the new directory behind because there is no real "branch
switching" involved that would notice that the directory can be
removed.
3. Then filter-branch invokes 'git clean' to remove exactly those
left-overs. But here it does not remove the directory.
4. The next tree-filter does not work as expected because there already
exists a directory with the new name.
Just add -d to 'git clean', so that empty directories are removed.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add support for creating a new tag object and retaining the tag message,
author, and date when rewriting tags. The gpg signature, if one exists,
will be stripped.
This adds nearly proper tag name filtering to filter-branch. Proper tag
name filtering would include the ability to change the tagger, tag date,
tag message, and _not_ strip a gpg signature if the tag did not change.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On some systems, 'sh' isn't very friendly. In particular,
t7003 fails on Solaris because it doesn't understand $().
Instead, use the specified SHELL_PATH to run shell code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some versions of xargs don't understand "-0"; fortunately in
this case we can get the same effect by using "git clean".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Specifying character ranges in tr differs between System V
and POSIX. In System V, brackets are required (e.g.,
'[A-Z]'), whereas in POSIX they are not.
We can mostly get around this by just using the bracket form
for both sets, as in:
tr '[A-Z] '[a-z]'
in which case POSIX interpets this as "'[' becomes '['",
which is OK.
However, this doesn't work with multiple sequences, like:
# rot13
tr '[A-Z][a-z]' '[N-Z][A-M][n-z][a-m]'
where the POSIX version does not behave the same as the
System V version. In this case, we must simply enumerate the
sequence.
This patch fixes problematic uses of tr in git scripts and
test scripts in one of three ways:
- if a single sequence, make sure it uses brackets
- if multiple sequences, enumerate
- if extra brackets (e.g., tr '[A]' 'a'), eliminate
brackets
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The subdirectory filter had a bug to notice that the commit in question
did not have anything in the path-limited part of the tree. $commit:$path
does not name an empty tree when $path does not appear in $commit.
This should fix it. The additional test in t7003 is originally from Kevin
Ballard but with fixups.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command used a very old fashioned construct to extract
filenames out of diff-index and ended up corrupting the output.
We can simply use --name-only and pipe into --stdin mode of
update-index. It's been like that for the past 2 years or so
since a94d994 (update-index: work with c-quoted name).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
filter-branch previously took the first non-option argument as the name for
a new branch. Since dfd05e38, it now takes a revision or a revision range
and modifies the current branch. Update to operate on HEAD by default to
conform with standard git interface practice.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the first things filter-branch does is to create a temporary
directory. This directory is eventually removed by the script during
normal operation, but is not removed if the script encounters an error.
Set a trap to remove it when the script terminates for any reason.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git-filter-branch branch' could fail producing the error:
"Which ref do you want to rewrite?" if existed another branch
or tag, which name was 'branch-something' or 'something/branch'.
[jc: original report and fix were done between Dmitry Potapov
and Dscho; I rewrote it using "rev-parse --symbolic-full-name"]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I hesitate to suggest this, since GNU tr has accepted \n for 15 years,
but there are supposedly a few crufty vendor-supplied versions of tr still
in use. Also, all of the other uses of tr-with-newline in git use \012.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was an attempt to list the refs that were rewritten by filtering
the output of 'git show-ref' for 'refs/original'. But it got the
grep argument wrong, which did not account for the SHA1 that is listed
before the ref.
Moreover, right before this summary is the loop that actually does the
rewriting, and the rewritten refs are listed there anyway. So this extra
summary is plainly too verbose.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some versions of 'tr' only accept octal codes if entered with three digits,
and therefor misinterpret the '\0' in the test suite.
Some versions of 'tr' reject the (needless) use of character classes.
Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It might be POSIX, but there are shells that do not like the
expression 'export VAR=VAL'. To be on the safe side, rewrite them
into 'VAR=VAL' and 'export VAR'.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you have a file called HEAD in your work tree, many commands that
our scripts feed "HEAD" to would complain about the rev vs path
ambiguity. A solution is to form command line more carefully by
appending -- to them, which makes it clear that we mean HEAD rev not
HEAD file.
This patch would apply to maint.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The helper functions 'map' and 'skip_commit' were provided to commit
filters by sourcing filter-branch itself. This was done with a certain
environment variable set to indicate that only the functions should be
defined, and the script should return then.
This was really hacky, and it did not work all that well, since the
full path to git-filter-branch was not known at all times.
Avoid that by putting the functions into a variable, and eval'ing
that variable. The commit filter gets these functions by prepending
the variable to the specified commands.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
--text follows this line--
These commands currently lack OPTIONS_SPEC; allow people to
easily list with "git grep 'OPTIONS_SPEC=$'" what they can help
improving.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier, "git filter-branch --<options> HEAD" would not update the
working tree after rewriting the branch. This commit fixes it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
A lot of shell scripts contained stuff starting with
while case "$#" in 0) break ;; esac
and similar. I consider breaking out of the condition instead of the
body od the loop ugly, and the implied "true" value of the
non-matching case is not really obvious to humans at first glance. It
happens not to be obvious to some BSD shells, either, but that's
because they are not POSIX-compliant. In most cases, this has been
replaced by a straight condition using "test". "case" has the
advantage of being faster than "test" on vintage shells where "test"
is not a builtin. Since none of them is likely to run the git
scripts, anyway, the added readability should be worth the change.
A few loops have had their termination condition expressed
differently.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With this function, a commit filter can leave out unwanted commits
(such as temporary commits). It does _not_ undo the changeset
corresponding to that commit, but it _skips_ the revision. IOW
no tree object is changed by this.
If you like to commit early and often, but want to filter out all
intermediate commits, marked by "@@@" in the commit message, you can
now do this with
git filter-branch --commit-filter '
if git cat-file commit $GIT_COMMIT | grep '@@@' > /dev/null;
then
skip_commit "$@";
else
git commit-tree "$@";
fi' newbranch
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the convenience functions to the top of git-filter-branch.sh, and
return from the script when the environment variable SOURCE_FUNCTIONS is
set.
By sourcing git-filter-branch with that variable set automatically, all
commit filters may access the convenience functions like "map".
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Later in a loop any existing ref whose path begins with it is
removed. It would be a disaster if you allowed it to say refs/head
for example.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Remove "DESTBRANCH" from usage, as it rewrites the branches given.
- Remove an = from an example usage, as the script doesn't understand
it.
Signed-off-by: Brian Gernhardt <benji@silverinsanity.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Quite a few of the scripts are rather careless about using GIT_DIR
while changing directories.
Some try their hands (with different likelihood of success) in making
GIT_DIR absolute.
This patch lets git-sh-setup.sh cater for absolute directories (in a
way that should work reliably also with non-Unix path names) and
removes the respective kludges in git-filter-branch.sh and
git-instaweb.sh.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On e.g. Ubuntu, dash is used as /bin/sh. Unlike bash it parses
commands like
a=$((echo stuff) | wc)
as an arithmetic expression while what we want is a subshell inside
a command substitution. Resolve the ambiguity by placing a space
between the two opening parentheses.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We used to take the first non-option argument as the name for the new
branch. This syntax is not extensible to support rewriting more than just
HEAD.
Instead, we now have the following syntax:
git filter-branch [<filter options>...] [<rev-list options>]
All positive refs given in <rev-list options> are rewritten. Yes,
in-place. If a ref was changed, the original head is stored in
refs/original/$ref now, for your inspecting pleasure, in addition to the
reflogs (since it is easier to inspect "git show-ref | grep original" than
to inspect all the reflogs).
This commit also adds the --force option to remove .git-rewrite/ and all
refs from refs/original/ before filtering.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was reported by Alex Riesen that "set -e" can break something as
trivial as "unset CDPATH" in bash.
So get rid of "set -e".
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace uses of cat that do nothing but writing the contents of
a single file to another command via pipe.
[jc: Original patch from Josh was somewhat buggy and rewrote
"cat $file | wc -l" to "wc -l $file", but this one should be Ok.]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A common mistake is to provide a filter which fails unwantedly. For
example, this will stop in the middle:
git filter-branch --env-filter '
test $GIT_COMMITTER_EMAIL = xyz &&
export GIT_COMMITTER_EMAIL = abc' rewritten
When $GIT_COMMITTER_EMAIL is not "xyz", the test fails, and consequently
the whole filter has a non-zero exit status. However, as demonstrated
in this example, filter-branch would just stop, and the user would be
none the wiser.
Also, a failing msg-filter would not have been caught, as was the
case with one of the tests.
This patch fixes both issues, by paying attention to the exit status
of msg-filter, and by saying what failed before exiting.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
--tag-name-filter may have failed before because
warn is used for reporting but was not available.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of filling the screen with progress lines, use \r so that
the progress can be seen, but warning messages are more visible.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the map function didn't find the rewritten commit of the passed in
original id, it printed the original id, but it still fell through to
the 'cat', which failed with an error message.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This moves the documentation in git-filter-branch.sh to its own
man page, with a few touch ups (incorporating comments by Frank
Lichtenheld).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I realize that a lot of people use the "git-xyzzy" format, and we have
various historical reasons for it, but I also think that most people have
long since started thinking of the git command as a single command with
various subcommands, and we've long had the documentation talk about it
that way.
Slowly migrating away from the git-xyzzy format would allow us to
eventually no longer install hundreds of binaries (even if most of them
are symlinks or hardlinks) in users $PATH, and the _original_ reasons for
it (implementation issues and bash completion) are really long long gone.
Using "git xyzzy" also has some fundamental advantages, like the ability
to specify things like paging ("git -p xyzzy") and making the whole notion
of aliases act like other git commands (which they already do, but they do
*not* have a "git-xyzzy" form!)
Anyway, while actually removing the "git-xyzzy" things is not practical
right now, we can certainly start slowly to deprecate it internally inside
git itself - in the shell scripts we use, and the test vectors.
This patch adds a "remove-dashes" makefile target, which does that. It
isn't particularly efficient or smart, but it *does* successfully rewrite
a lot of our shell scripts to use the "git xyzzy" form for all built-in
commands.
(For non-builtins, the "git xyzzy" format implies an extra execve(), so
this script leaves those alone).
So apply this patch, and then run
make remove-dashes
make test
git commit -a
to generate a much larger patch that actually starts this transformation.
(The only half-way subtle thing about this is that it also fixes up
git-filter-branch.sh for the new world order by adding quoting around
the use of "git-commit-tree" as an argument. It doesn't need it in that
format, but when changed into "git commit-tree" it is no longer a single
word, and the quoting maintains the old behaviour).
NOTE! This does not yet mean that you can actually stop installing the
"git-xyzzy" binaries for the builtins. There are some remaining places
that want to use the old form, this just removes the most obvious ones
that can easily be done automatically.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is based on Jeff King's example in
20070621130137.GB4487@coredump.intra.peff.net
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When two branches are merged that modify a subdirectory (possibly in
different intermediate steps) such that both end up identical, then
rev-list chooses only one branch. But when we filter history, we want to
keep both branches. Therefore, we must use --full-history.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can use git rev-list --parents when we list the commits to rewrite.
It is not necessary to run git rev-list --parents for each commit in the
loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>