git/builtin
Jeff King a872275098 teach fast-export an --anonymize option
Sometimes users want to report a bug they experience on
their repository, but they are not at liberty to share the
contents of the repository. It would be useful if they could
produce a repository that has a similar shape to its history
and tree, but without leaking any information. This
"anonymized" repository could then be shared with developers
(assuming it still replicates the original problem).

This patch implements an "--anonymize" option to
fast-export, which generates a stream that can recreate such
a repository. Producing a single stream makes it easy for
the caller to verify that they are not leaking any useful
information. You can get an overview of what will be shared
by running a command like:

  git fast-export --anonymize --all |
  perl -pe 's/\d+/X/g' |
  sort -u |
  less

which will show every unique line we generate, modulo any
numbers (each anonymized token is assigned a number, like
"User 0", and we replace it consistently in the output).

In addition to anonymizing, this produces test cases that
are relatively small (compared to the original repository)
and fast to generate (compared to using filter-branch, or
modifying the output of fast-export yourself). Here are
numbers for git.git:

  $ time git fast-export --anonymize --all \
         --tag-of-filtered-object=drop >output
  real    0m2.883s
  user    0m2.828s
  sys     0m0.052s

  $ gzip output
  $ ls -lh output.gz | awk '{print $5}'
  2.9M

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-27 10:42:16 -07:00
..
add.c read-cache: new API write_locked_index instead of write_index/write_cache 2014-06-13 11:49:10 -07:00
annotate.c annotate: use argv_array 2014-07-16 11:10:11 -07:00
apply.c Merge branch 'jk/misc-fixes-maint' 2014-07-28 11:30:41 -07:00
archive.c
bisect--helper.c
blame.c Merge branch 'rs/code-cleaning' 2014-07-22 10:59:37 -07:00
branch.c refactor skip_prefix to return a boolean 2014-06-20 10:44:43 -07:00
bundle.c
cat-file.c
check-attr.c
check-ignore.c
check-mailmap.c
check-ref-format.c
checkout-index.c entry.c: update cache_changed if refresh_cache is set in checkout_entry() 2014-06-13 11:49:39 -07:00
checkout.c Merge branch 'nd/split-index' 2014-07-16 11:25:40 -07:00
clean.c use xcalloc() to allocate zero-initialized memory 2014-07-21 10:30:21 -07:00
clone.c use local cloning if insteadOf makes a local URL 2014-07-17 11:17:13 -07:00
column.c
commit-tree.c commit_tree: take a pointer/len pair rather than a const strbuf 2014-06-12 10:29:41 -07:00
commit.c Merge branch 'ta/string-list-init' 2014-07-23 11:35:54 -07:00
config.c Merge branch 'jk/daemon-tolower' 2014-06-16 10:07:15 -07:00
count-objects.c
credential.c
describe.c hashmap: add simplified hashmap_get_from_hash() API 2014-07-07 13:56:35 -07:00
diff-files.c
diff-index.c
diff-tree.c Merge branch 'jk/alloc-commit-id' 2014-07-22 10:59:25 -07:00
diff.c
fast-export.c teach fast-export an --anonymize option 2014-08-27 10:42:16 -07:00
fetch-pack.c
fetch.c Merge branch 'jk/xstrfmt' 2014-07-09 11:34:05 -07:00
fmt-merge-msg.c Merge branch 'jk/xstrfmt' 2014-07-09 11:34:05 -07:00
for-each-ref.c use commit_list_count() to count the members of commit_lists 2014-07-17 13:36:25 -07:00
fsck.c refs.c: add a public is_branch function 2014-07-16 13:06:41 -07:00
gc.c Merge branch 'nd/daemonize-gc' into maint 2014-06-25 11:47:36 -07:00
get-tar-commit-id.c
grep.c Merge branch 'sk/spawn-less-case-insensitively-from-grep-O-i' into maint 2014-06-25 11:47:49 -07:00
hash-object.c
help.c
index-pack.c Merge branch 'maint' 2014-07-21 12:35:39 -07:00
init-db.c i18n: only extract comments marked with "TRANSLATORS:" 2014-04-17 11:09:56 -07:00
log.c Merge branch 'jk/commit-buffer-length' into maint 2014-07-16 11:16:38 -07:00
ls-files.c
ls-remote.c builtin/ls-remote.c: rearrange xcalloc arguments 2014-05-27 14:00:43 -07:00
ls-tree.c
mailinfo.c Merge branch 'rs/mailinfo-header-cmp' into maint 2014-06-25 11:48:23 -07:00
mailsplit.c
merge-base.c
merge-file.c
merge-index.c
merge-ours.c
merge-recursive.c
merge-tree.c
merge.c Merge branch 'rs/code-cleaning' 2014-07-16 11:33:09 -07:00
mktag.c
mktree.c
mv.c Merge branch 'nd/split-index' 2014-07-16 11:25:40 -07:00
name-rev.c use xstrfmt to replace xmalloc + strcpy/strcat 2014-06-19 15:20:54 -07:00
notes.c Merge branch 'mh/ref-transaction' 2014-06-03 12:06:41 -07:00
pack-objects.c Merge branch 'jk/repack-pack-writebitmaps-config' 2014-06-25 12:23:19 -07:00
pack-redundant.c
pack-refs.c
patch-id.c patch-id: make it stable against hunk reordering 2014-06-10 13:09:24 -07:00
prune-packed.c
prune.c
push.c refactor skip_prefix to return a boolean 2014-06-20 10:44:43 -07:00
read-tree.c read-tree: note about dropping split-index mode or index version 2014-06-13 11:49:41 -07:00
receive-pack.c Merge branch 'jk/misc-fixes-maint' 2014-07-28 11:30:41 -07:00
reflog.c refs.c: add new functions reflog_exists and delete_reflog 2014-05-08 14:31:43 -07:00
remote-ext.c
remote-fd.c
remote.c Merge branch 'rs/ref-transaction-0' 2014-07-21 11:18:37 -07:00
repack.c Merge branch 'jk/strip-suffix' 2014-07-16 11:26:00 -07:00
replace.c Merge branch 'cc/replace-graft' 2014-07-27 15:14:18 -07:00
rerere.c rerere: fix for merge.conflictstyle 2014-04-30 10:30:02 -07:00
reset.c Merge branch 'nd/split-index' 2014-07-16 11:25:40 -07:00
rev-list.c commit: record buffer length in cache 2014-06-13 12:09:38 -07:00
rev-parse.c Merge branch 'jk/misc-fixes-maint' 2014-07-28 11:30:41 -07:00
revert.c
rm.c read-cache: new API write_locked_index instead of write_index/write_cache 2014-06-13 11:49:10 -07:00
send-pack.c
shortlog.c
show-branch.c Merge branch 'jk/misc-fixes-maint' 2014-07-28 11:30:41 -07:00
show-ref.c
stripspace.c
symbolic-ref.c
tag.c Merge branch 'jk/tag-sort' 2014-07-23 11:35:45 -07:00
unpack-file.c
unpack-objects.c
update-index.c Merge branch 'nd/split-index' 2014-07-16 11:25:40 -07:00
update-ref.c refs.c: change ref_transaction_update() to do error checking and return status 2014-07-14 11:54:42 -07:00
update-server-info.c
upload-archive.c
var.c
verify-commit.c verify-commit: scriptable commit signature verification 2014-06-23 15:50:31 -07:00
verify-pack.c verify-pack: use strbuf_strip_suffix 2014-06-30 13:43:32 -07:00
verify-tag.c
write-tree.c