You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
879 lines
28 KiB
879 lines
28 KiB
= My First Object Walk |
|
|
|
== What's an Object Walk? |
|
|
|
The object walk is a key concept in Git - this is the process that underpins |
|
operations like object transfer and fsck. Beginning from a given commit, the |
|
list of objects is found by walking parent relationships between commits (commit |
|
X based on commit W) and containment relationships between objects (tree Y is |
|
contained within commit X, and blob Z is located within tree Y, giving our |
|
working tree for commit X something like `y/z.txt`). |
|
|
|
A related concept is the revision walk, which is focused on commit objects and |
|
their parent relationships and does not delve into other object types. The |
|
revision walk is used for operations like `git log`. |
|
|
|
=== Related Reading |
|
|
|
- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of |
|
the revision walker in its various incarnations. |
|
- `revision.h` |
|
- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists] |
|
gives a good overview of the types of objects in Git and what your object |
|
walk is really describing. |
|
|
|
== Setting Up |
|
|
|
Create a new branch from `master`. |
|
|
|
---- |
|
git checkout -b revwalk origin/master |
|
---- |
|
|
|
We'll put our fiddling into a new command. For fun, let's name it `git walken`. |
|
Open up a new file `builtin/walken.c` and set up the command handler: |
|
|
|
---- |
|
/* |
|
* "git walken" |
|
* |
|
* Part of the "My First Object Walk" tutorial. |
|
*/ |
|
|
|
#include "builtin.h" |
|
|
|
int cmd_walken(int argc, const char **argv, const char *prefix) |
|
{ |
|
trace_printf(_("cmd_walken incoming...\n")); |
|
return 0; |
|
} |
|
---- |
|
|
|
NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or |
|
off at runtime. For the purposes of this tutorial, we will write `walken` as |
|
though it is intended for use as a "plumbing" command: that is, a command which |
|
is used primarily in scripts, rather than interactively by humans (a "porcelain" |
|
command). So we will send our debug output to `trace_printf()` instead. When |
|
running, enable trace output by setting the environment variable `GIT_TRACE`. |
|
|
|
Add usage text and `-h` handling, like all subcommands should consistently do |
|
(our test suite will notice and complain if you fail to do so). |
|
We'll need to include the `parse-options.h` header. |
|
|
|
---- |
|
#include "parse-options.h" |
|
|
|
... |
|
|
|
int cmd_walken(int argc, const char **argv, const char *prefix) |
|
{ |
|
const char * const walken_usage[] = { |
|
N_("git walken"), |
|
NULL, |
|
}; |
|
struct option options[] = { |
|
OPT_END() |
|
}; |
|
|
|
argc = parse_options(argc, argv, prefix, options, walken_usage, 0); |
|
|
|
... |
|
} |
|
---- |
|
|
|
Also add the relevant line in `builtin.h` near `cmd_whatchanged()`: |
|
|
|
---- |
|
int cmd_walken(int argc, const char **argv, const char *prefix); |
|
---- |
|
|
|
Include the command in `git.c` in `commands[]` near the entry for `whatchanged`, |
|
maintaining alphabetical ordering: |
|
|
|
---- |
|
{ "walken", cmd_walken, RUN_SETUP }, |
|
---- |
|
|
|
Add it to the `Makefile` near the line for `builtin/worktree.o`: |
|
|
|
---- |
|
BUILTIN_OBJS += builtin/walken.o |
|
---- |
|
|
|
Build and test out your command, without forgetting to ensure the `DEVELOPER` |
|
flag is set, and with `GIT_TRACE` enabled so the debug output can be seen: |
|
|
|
---- |
|
$ echo DEVELOPER=1 >>config.mak |
|
$ make |
|
$ GIT_TRACE=1 ./bin-wrappers/git walken |
|
---- |
|
|
|
NOTE: For a more exhaustive overview of the new command process, take a look at |
|
`Documentation/MyFirstContribution.txt`. |
|
|
|
NOTE: A reference implementation can be found at |
|
https://github.com/nasamuffin/git/tree/revwalk. |
|
|
|
=== `struct rev_cmdline_info` |
|
|
|
The definition of `struct rev_cmdline_info` can be found in `revision.h`. |
|
|
|
This struct is contained within the `rev_info` struct and is used to reflect |
|
parameters provided by the user over the CLI. |
|
|
|
`nr` represents the number of `rev_cmdline_entry` present in the array. |
|
|
|
`alloc` is used by the `ALLOC_GROW` macro. Check `cache.h` - this variable is |
|
used to track the allocated size of the list. |
|
|
|
Per entry, we find: |
|
|
|
`item` is the object provided upon which to base the object walk. Items in Git |
|
can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.) |
|
|
|
`name` is the object ID (OID) of the object - a hex string you may be familiar |
|
with from using Git to organize your source in the past. Check the tutorial |
|
mentioned above towards the top for a discussion of where the OID can come |
|
from. |
|
|
|
`whence` indicates some information about what to do with the parents of the |
|
specified object. We'll explore this flag more later on; take a look at |
|
`Documentation/revisions.txt` to get an idea of what could set the `whence` |
|
value. |
|
|
|
`flags` are used to hint the beginning of the revision walk and are the first |
|
block under the `#include`s in `revision.h`. The most likely ones to be set in |
|
the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags |
|
can be used during the walk, as well. |
|
|
|
=== `struct rev_info` |
|
|
|
This one is quite a bit longer, and many fields are only used during the walk |
|
by `revision.c` - not configuration options. Most of the configurable flags in |
|
`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a |
|
good idea to take some time and read through that document. |
|
|
|
== Basic Commit Walk |
|
|
|
First, let's see if we can replicate the output of `git log --oneline`. We'll |
|
refer back to the implementation frequently to discover norms when performing |
|
an object walk of our own. |
|
|
|
To do so, we'll first find all the commits, in order, which preceded the current |
|
commit. We'll extract the name and subject of the commit from each. |
|
|
|
Ideally, we will also be able to find out which ones are currently at the tip of |
|
various branches. |
|
|
|
=== Setting Up |
|
|
|
Preparing for your object walk has some distinct stages. |
|
|
|
1. Perform default setup for this mode, and others which may be invoked. |
|
2. Check configuration files for relevant settings. |
|
3. Set up the `rev_info` struct. |
|
4. Tweak the initialized `rev_info` to suit the current walk. |
|
5. Prepare the `rev_info` for the walk. |
|
6. Iterate over the objects, processing each one. |
|
|
|
==== Default Setups |
|
|
|
Before examining configuration files which may modify command behavior, set up |
|
default state for switches or options your command may have. If your command |
|
utilizes other Git components, ask them to set up their default states as well. |
|
For instance, `git log` takes advantage of `grep` and `diff` functionality, so |
|
its `init_log_defaults()` sets its own state (`decoration_style`) and asks |
|
`grep` and `diff` to initialize themselves by calling each of their |
|
initialization functions. |
|
|
|
==== Configuring From `.gitconfig` |
|
|
|
Next, we should have a look at any relevant configuration settings (i.e., |
|
settings readable and settable from `git config`). This is done by providing a |
|
callback to `git_config()`; within that callback, you can also invoke methods |
|
from other components you may need that need to intercept these options. Your |
|
callback will be invoked once per each configuration value which Git knows about |
|
(global, local, worktree, etc.). |
|
|
|
Similarly to the default values, we don't have anything to do here yet |
|
ourselves; however, we should call `git_default_config()` if we aren't calling |
|
any other existing config callbacks. |
|
|
|
Add a new function to `builtin/walken.c`. |
|
We'll also need to include the `config.h` header: |
|
|
|
---- |
|
#include "config.h" |
|
|
|
... |
|
|
|
static int git_walken_config(const char *var, const char *value, void *cb) |
|
{ |
|
/* |
|
* For now, we don't have any custom configuration, so fall back to |
|
* the default config. |
|
*/ |
|
return git_default_config(var, value, cb); |
|
} |
|
---- |
|
|
|
Make sure to invoke `git_config()` with it in your `cmd_walken()`: |
|
|
|
---- |
|
int cmd_walken(int argc, const char **argv, const char *prefix) |
|
{ |
|
... |
|
|
|
git_config(git_walken_config, NULL); |
|
|
|
... |
|
} |
|
---- |
|
|
|
==== Setting Up `rev_info` |
|
|
|
Now that we've gathered external configuration and options, it's time to |
|
initialize the `rev_info` object which we will use to perform the walk. This is |
|
typically done by calling `repo_init_revisions()` with the repository you intend |
|
to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info` |
|
struct. |
|
|
|
Add the `struct rev_info` and the `repo_init_revisions()` call. |
|
We'll also need to include the `revision.h` header: |
|
|
|
---- |
|
#include "revision.h" |
|
|
|
... |
|
|
|
int cmd_walken(int argc, const char **argv, const char *prefix) |
|
{ |
|
/* This can go wherever you like in your declarations.*/ |
|
struct rev_info rev; |
|
... |
|
|
|
/* This should go after the git_config() call. */ |
|
repo_init_revisions(the_repository, &rev, prefix); |
|
|
|
... |
|
} |
|
---- |
|
|
|
==== Tweaking `rev_info` For the Walk |
|
|
|
We're getting close, but we're still not quite ready to go. Now that `rev` is |
|
initialized, we can modify it to fit our needs. This is usually done within a |
|
helper for clarity, so let's add one: |
|
|
|
---- |
|
static void final_rev_info_setup(struct rev_info *rev) |
|
{ |
|
/* |
|
* We want to mimic the appearance of `git log --oneline`, so let's |
|
* force oneline format. |
|
*/ |
|
get_commit_format("oneline", rev); |
|
|
|
/* Start our object walk at HEAD. */ |
|
add_head_to_pending(rev); |
|
} |
|
---- |
|
|
|
[NOTE] |
|
==== |
|
Instead of using the shorthand `add_head_to_pending()`, you could do |
|
something like this: |
|
---- |
|
struct setup_revision_opt opt; |
|
|
|
memset(&opt, 0, sizeof(opt)); |
|
opt.def = "HEAD"; |
|
opt.revarg_opt = REVARG_COMMITTISH; |
|
setup_revisions(argc, argv, rev, &opt); |
|
---- |
|
Using a `setup_revision_opt` gives you finer control over your walk's starting |
|
point. |
|
==== |
|
|
|
Then let's invoke `final_rev_info_setup()` after the call to |
|
`repo_init_revisions()`: |
|
|
|
---- |
|
int cmd_walken(int argc, const char **argv, const char *prefix) |
|
{ |
|
... |
|
|
|
final_rev_info_setup(&rev); |
|
|
|
... |
|
} |
|
---- |
|
|
|
Later, we may wish to add more arguments to `final_rev_info_setup()`. But for |
|
now, this is all we need. |
|
|
|
==== Preparing `rev_info` For the Walk |
|
|
|
Now that `rev` is all initialized and configured, we've got one more setup step |
|
before we get rolling. We can do this in a helper, which will both prepare the |
|
`rev_info` for the walk, and perform the walk itself. Let's start the helper |
|
with the call to `prepare_revision_walk()`, which can return an error without |
|
dying on its own: |
|
|
|
---- |
|
static void walken_commit_walk(struct rev_info *rev) |
|
{ |
|
if (prepare_revision_walk(rev)) |
|
die(_("revision walk setup failed")); |
|
} |
|
---- |
|
|
|
NOTE: `die()` prints to `stderr` and exits the program. Since it will print to |
|
`stderr` it's likely to be seen by a human, so we will localize it. |
|
|
|
==== Performing the Walk! |
|
|
|
Finally! We are ready to begin the walk itself. Now we can see that `rev_info` |
|
can also be used as an iterator; we move to the next item in the walk by using |
|
`get_revision()` repeatedly. Add the listed variable declarations at the top and |
|
the walk loop below the `prepare_revision_walk()` call within your |
|
`walken_commit_walk()`: |
|
|
|
---- |
|
static void walken_commit_walk(struct rev_info *rev) |
|
{ |
|
struct commit *commit; |
|
struct strbuf prettybuf = STRBUF_INIT; |
|
|
|
... |
|
|
|
while ((commit = get_revision(rev))) { |
|
strbuf_reset(&prettybuf); |
|
pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf); |
|
puts(prettybuf.buf); |
|
} |
|
strbuf_release(&prettybuf); |
|
} |
|
---- |
|
|
|
NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the |
|
command we expect to be machine-parsed, we're sending it directly to stdout. |
|
|
|
Give it a shot. |
|
|
|
---- |
|
$ make |
|
$ ./bin-wrappers/git walken |
|
---- |
|
|
|
You should see all of the subject lines of all the commits in |
|
your tree's history, in order, ending with the initial commit, "Initial revision |
|
of "git", the information manager from hell". Congratulations! You've written |
|
your first revision walk. You can play with printing some additional fields |
|
from each commit if you're curious; have a look at the functions available in |
|
`commit.h`. |
|
|
|
=== Adding a Filter |
|
|
|
Next, let's try to filter the commits we see based on their author. This is |
|
equivalent to running `git log --author=<pattern>`. We can add a filter by |
|
modifying `rev_info.grep_filter`, which is a `struct grep_opt`. |
|
|
|
First some setup. Add `grep_config()` to `git_walken_config()`: |
|
|
|
---- |
|
static int git_walken_config(const char *var, const char *value, void *cb) |
|
{ |
|
grep_config(var, value, cb); |
|
return git_default_config(var, value, cb); |
|
} |
|
---- |
|
|
|
Next, we can modify the `grep_filter`. This is done with convenience functions |
|
found in `grep.h`. For fun, we're filtering to only commits from folks using a |
|
`gmail.com` email address - a not-very-precise guess at who may be working on |
|
Git as a hobby. Since we're checking the author, which is a specific line in the |
|
header, we'll use the `append_header_grep_pattern()` helper. We can use |
|
the `enum grep_header_field` to indicate which part of the commit header we want |
|
to search. |
|
|
|
In `final_rev_info_setup()`, add your filter line: |
|
|
|
---- |
|
static void final_rev_info_setup(int argc, const char **argv, |
|
const char *prefix, struct rev_info *rev) |
|
{ |
|
... |
|
|
|
append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, |
|
"gmail"); |
|
compile_grep_patterns(&rev->grep_filter); |
|
|
|
... |
|
} |
|
---- |
|
|
|
`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but |
|
it won't work unless we compile it with `compile_grep_patterns()`. |
|
|
|
NOTE: If you are using `setup_revisions()` (for example, if you are passing a |
|
`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need |
|
to call `compile_grep_patterns()` because `setup_revisions()` calls it for you. |
|
|
|
NOTE: We could add the same filter via the `append_grep_pattern()` helper if we |
|
wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and |
|
`enum grep_pat_token` for us. |
|
|
|
=== Changing the Order |
|
|
|
There are a few ways that we can change the order of the commits during a |
|
revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some |
|
typical orderings. |
|
|
|
`topo_order` is the same as `git log --topo-order`: we avoid showing a parent |
|
before all of its children have been shown, and we avoid mixing commits which |
|
are in different lines of history. (`git help log`'s section on `--topo-order` |
|
has a very nice diagram to illustrate this.) |
|
|
|
Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to |
|
`REV_SORT_BY_AUTHOR_DATE`. Add the following: |
|
|
|
---- |
|
static void final_rev_info_setup(int argc, const char **argv, |
|
const char *prefix, struct rev_info *rev) |
|
{ |
|
... |
|
|
|
rev->topo_order = 1; |
|
rev->sort_order = REV_SORT_BY_COMMIT_DATE; |
|
|
|
... |
|
} |
|
---- |
|
|
|
Let's output this into a file so we can easily diff it with the walk sorted by |
|
author date. |
|
|
|
---- |
|
$ make |
|
$ ./bin-wrappers/git walken > commit-date.txt |
|
---- |
|
|
|
Then, let's sort by author date and run it again. |
|
|
|
---- |
|
static void final_rev_info_setup(int argc, const char **argv, |
|
const char *prefix, struct rev_info *rev) |
|
{ |
|
... |
|
|
|
rev->topo_order = 1; |
|
rev->sort_order = REV_SORT_BY_AUTHOR_DATE; |
|
|
|
... |
|
} |
|
---- |
|
|
|
---- |
|
$ make |
|
$ ./bin-wrappers/git walken > author-date.txt |
|
---- |
|
|
|
Finally, compare the two. This is a little less helpful without object names or |
|
dates, but hopefully we get the idea. |
|
|
|
---- |
|
$ diff -u commit-date.txt author-date.txt |
|
---- |
|
|
|
This display indicates that commits can be reordered after they're written, for |
|
example with `git rebase`. |
|
|
|
Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag. |
|
Set that flag somewhere inside of `final_rev_info_setup()`: |
|
|
|
---- |
|
static void final_rev_info_setup(int argc, const char **argv, const char *prefix, |
|
struct rev_info *rev) |
|
{ |
|
... |
|
|
|
rev->reverse = 1; |
|
|
|
... |
|
} |
|
---- |
|
|
|
Run your walk again and note the difference in order. (If you remove the grep |
|
pattern, you should see the last commit this call gives you as your current |
|
HEAD.) |
|
|
|
== Basic Object Walk |
|
|
|
So far we've been walking only commits. But Git has more types of objects than |
|
that! Let's see if we can walk _all_ objects, and find out some information |
|
about each one. |
|
|
|
We can base our work on an example. `git pack-objects` prepares all kinds of |
|
objects for packing into a bitmap or packfile. The work we are interested in |
|
resides in `builtins/pack-objects.c:get_object_list()`; examination of that |
|
function shows that the all-object walk is being performed by |
|
`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two |
|
functions reside in `list-objects.c`; examining the source shows that, despite |
|
the name, these functions traverse all kinds of objects. Let's have a look at |
|
the arguments to `traverse_commit_list()`. |
|
|
|
- `struct rev_info *revs`: This is the `rev_info` used for the walk. If |
|
its `filter` member is not `NULL`, then `filter` contains information for |
|
how to filter the object list. |
|
- `show_commit_fn show_commit`: A callback which will be used to handle each |
|
individual commit object. |
|
- `show_object_fn show_object`: A callback which will be used to handle each |
|
non-commit object (so each blob, tree, or tag). |
|
- `void *show_data`: A context buffer which is passed in turn to `show_commit` |
|
and `show_object`. |
|
|
|
In addition, `traverse_commit_list_filtered()` has an additional paramter: |
|
|
|
- `struct oidset *omitted`: A linked-list of object IDs which the provided |
|
filter caused to be omitted. |
|
|
|
It looks like these methods use callbacks we provide instead of needing us |
|
to call it repeatedly ourselves. Cool! Let's add the callbacks first. |
|
|
|
For the sake of this tutorial, we'll simply keep track of how many of each kind |
|
of object we find. At file scope in `builtin/walken.c` add the following |
|
tracking variables: |
|
|
|
---- |
|
static int commit_count; |
|
static int tag_count; |
|
static int blob_count; |
|
static int tree_count; |
|
---- |
|
|
|
Commits are handled by a different callback than other objects; let's do that |
|
one first: |
|
|
|
---- |
|
static void walken_show_commit(struct commit *cmt, void *buf) |
|
{ |
|
commit_count++; |
|
} |
|
---- |
|
|
|
The `cmt` argument is fairly self-explanatory. But it's worth mentioning that |
|
the `buf` argument is actually the context buffer that we can provide to the |
|
traversal calls - `show_data`, which we mentioned a moment ago. |
|
|
|
Since we have the `struct commit` object, we can look at all the same parts that |
|
we looked at in our earlier commit-only walk. For the sake of this tutorial, |
|
though, we'll just increment the commit counter and move on. |
|
|
|
The callback for non-commits is a little different, as we'll need to check |
|
which kind of object we're dealing with: |
|
|
|
---- |
|
static void walken_show_object(struct object *obj, const char *str, void *buf) |
|
{ |
|
switch (obj->type) { |
|
case OBJ_TREE: |
|
tree_count++; |
|
break; |
|
case OBJ_BLOB: |
|
blob_count++; |
|
break; |
|
case OBJ_TAG: |
|
tag_count++; |
|
break; |
|
case OBJ_COMMIT: |
|
BUG("unexpected commit object in walken_show_object\n"); |
|
default: |
|
BUG("unexpected object type %s in walken_show_object\n", |
|
type_name(obj->type)); |
|
} |
|
} |
|
---- |
|
|
|
Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same |
|
context pointer that `walken_show_commit()` receives: the `show_data` argument |
|
to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally, |
|
`str` contains the name of the object, which ends up being something like |
|
`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag). |
|
|
|
To help assure us that we aren't double-counting commits, we'll include some |
|
complaining if a commit object is routed through our non-commit callback; we'll |
|
also complain if we see an invalid object type. Since those two cases should be |
|
unreachable, and would only change in the event of a semantic change to the Git |
|
codebase, we complain by using `BUG()` - which is a signal to a developer that |
|
the change they made caused unintended consequences, and the rest of the |
|
codebase needs to be updated to understand that change. `BUG()` is not intended |
|
to be seen by the public, so it is not localized. |
|
|
|
Our main object walk implementation is substantially different from our commit |
|
walk implementation, so let's make a new function to perform the object walk. We |
|
can perform setup which is applicable to all objects here, too, to keep separate |
|
from setup which is applicable to commit-only walks. |
|
|
|
We'll start by enabling all types of objects in the `struct rev_info`. We'll |
|
also turn on `tree_blobs_in_commit_order`, which means that we will walk a |
|
commit's tree and everything it points to immediately after we find each commit, |
|
as opposed to waiting for the end and walking through all trees after the commit |
|
history has been discovered. With the appropriate settings configured, we are |
|
ready to call `prepare_revision_walk()`. |
|
|
|
---- |
|
static void walken_object_walk(struct rev_info *rev) |
|
{ |
|
rev->tree_objects = 1; |
|
rev->blob_objects = 1; |
|
rev->tag_objects = 1; |
|
rev->tree_blobs_in_commit_order = 1; |
|
|
|
if (prepare_revision_walk(rev)) |
|
die(_("revision walk setup failed")); |
|
|
|
commit_count = 0; |
|
tag_count = 0; |
|
blob_count = 0; |
|
tree_count = 0; |
|
---- |
|
|
|
Let's start by calling just the unfiltered walk and reporting our counts. |
|
Complete your implementation of `walken_object_walk()`. |
|
We'll also need to include the `list-objects.h` header. |
|
|
|
---- |
|
#include "list-objects.h" |
|
|
|
... |
|
|
|
traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL); |
|
|
|
printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count, |
|
blob_count, tag_count, tree_count); |
|
} |
|
---- |
|
|
|
NOTE: This output is intended to be machine-parsed. Therefore, we are not |
|
sending it to `trace_printf()`, and we are not localizing it - we need scripts |
|
to be able to count on the formatting to be exactly the way it is shown here. |
|
If we were intending this output to be read by humans, we would need to localize |
|
it with `_()`. |
|
|
|
Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing |
|
command line options is out of scope for this tutorial, so we'll just hardcode |
|
a branch we can change at compile time. Where you call `final_rev_info_setup()` |
|
and `walken_commit_walk()`, instead branch like so: |
|
|
|
---- |
|
if (1) { |
|
add_head_to_pending(&rev); |
|
walken_object_walk(&rev); |
|
} else { |
|
final_rev_info_setup(argc, argv, prefix, &rev); |
|
walken_commit_walk(&rev); |
|
} |
|
---- |
|
|
|
NOTE: For simplicity, we've avoided all the filters and sorts we applied in |
|
`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you |
|
want, you can certainly use the filters we added before by moving |
|
`final_rev_info_setup()` out of the conditional and removing the call to |
|
`add_head_to_pending()`. |
|
|
|
Now we can try to run our command! It should take noticeably longer than the |
|
commit walk, but an examination of the output will give you an idea why. Your |
|
output should look similar to this example, but with different counts: |
|
|
|
---- |
|
Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees. |
|
---- |
|
|
|
This makes sense. We have more trees than commits because the Git project has |
|
lots of subdirectories which can change, plus at least one tree per commit. We |
|
have no tags because we started on a commit (`HEAD`) and while tags can point to |
|
commits, commits can't point to tags. |
|
|
|
NOTE: You will have different counts when you run this yourself! The number of |
|
objects grows along with the Git project. |
|
|
|
=== Adding a Filter |
|
|
|
There are a handful of filters that we can apply to the object walk laid out in |
|
`Documentation/rev-list-options.txt`. These filters are typically useful for |
|
operations such as creating packfiles or performing a partial clone. They are |
|
defined in `list-objects-filter-options.h`. For the purposes of this tutorial we |
|
will use the "tree:1" filter, which causes the walk to omit all trees and blobs |
|
which are not directly referenced by commits reachable from the commit in |
|
`pending` when the walk begins. (`pending` is the list of objects which need to |
|
be traversed during a walk; you can imagine a breadth-first tree traversal to |
|
help understand. In our case, that means we omit trees and blobs not directly |
|
referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only |
|
`HEAD` in the `pending` list.) |
|
|
|
For now, we are not going to track the omitted objects, so we'll replace those |
|
parameters with `NULL`. For the sake of simplicity, we'll add a simple |
|
build-time branch to use our filter or not. Preface the line calling |
|
`traverse_commit_list()` with the following, which will remind us which kind of |
|
walk we've just performed: |
|
|
|
---- |
|
if (0) { |
|
/* Unfiltered: */ |
|
trace_printf(_("Unfiltered object walk.\n")); |
|
} else { |
|
trace_printf( |
|
_("Filtered object walk with filterspec 'tree:1'.\n")); |
|
CALLOC_ARRAY(rev->filter, 1); |
|
parse_list_objects_filter(rev->filter, "tree:1"); |
|
} |
|
traverse_commit_list(rev, walken_show_commit, |
|
walken_show_object, NULL); |
|
---- |
|
|
|
The `rev->filter` member is usually built directly from a command |
|
line argument, so the module provides an easy way to build one from a string. |
|
Even though we aren't taking user input right now, we can still build one with |
|
a hardcoded string using `parse_list_objects_filter()`. |
|
|
|
With the filter spec "tree:1", we are expecting to see _only_ the root tree for |
|
each commit; therefore, the tree object count should be less than or equal to |
|
the number of commits. (For an example of why that's true: `git commit --revert` |
|
points to the same tree object as its grandparent.) |
|
|
|
=== Counting Omitted Objects |
|
|
|
We also have the capability to enumerate all objects which were omitted by a |
|
filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking |
|
`traverse_commit_list_filtered()` to populate the `omitted` list means that our |
|
object walk does not perform any better than an unfiltered object walk; all |
|
reachable objects are walked in order to populate the list. |
|
|
|
First, add the `struct oidset` and related items we will use to iterate it: |
|
|
|
---- |
|
static void walken_object_walk( |
|
... |
|
|
|
struct oidset omitted; |
|
struct oidset_iter oit; |
|
struct object_id *oid = NULL; |
|
int omitted_count = 0; |
|
oidset_init(&omitted, 0); |
|
|
|
... |
|
---- |
|
|
|
Modify the call to `traverse_commit_list_filtered()` to include your `omitted` |
|
object: |
|
|
|
---- |
|
... |
|
|
|
traverse_commit_list_filtered(rev, |
|
walken_show_commit, walken_show_object, NULL, &omitted); |
|
|
|
... |
|
---- |
|
|
|
Then, after your traversal, the `oidset` traversal is pretty straightforward. |
|
Count all the objects within and modify the print statement: |
|
|
|
---- |
|
/* Count the omitted objects. */ |
|
oidset_iter_init(&omitted, &oit); |
|
|
|
while ((oid = oidset_iter_next(&oit))) |
|
omitted_count++; |
|
|
|
printf("commits %d\nblobs %d\ntags %d\ntrees %d\nomitted %d\n", |
|
commit_count, blob_count, tag_count, tree_count, omitted_count); |
|
---- |
|
|
|
By running your walk with and without the filter, you should find that the total |
|
object count in each case is identical. You can also time each invocation of |
|
the `walken` subcommand, with and without `omitted` being passed in, to confirm |
|
to yourself the runtime impact of tracking all omitted objects. |
|
|
|
=== Changing the Order |
|
|
|
Finally, let's demonstrate that you can also reorder walks of all objects, not |
|
just walks of commits. First, we'll make our handlers chattier - modify |
|
`walken_show_commit()` and `walken_show_object()` to print the object as they |
|
go: |
|
|
|
---- |
|
static void walken_show_commit(struct commit *cmt, void *buf) |
|
{ |
|
trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid)); |
|
commit_count++; |
|
} |
|
|
|
static void walken_show_object(struct object *obj, const char *str, void *buf) |
|
{ |
|
trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid)); |
|
|
|
... |
|
} |
|
---- |
|
|
|
NOTE: Since we will be examining this output directly as humans, we'll use |
|
`trace_printf()` here. Additionally, since this change introduces a significant |
|
number of printed lines, using `trace_printf()` will allow us to easily silence |
|
those lines without having to recompile. |
|
|
|
(Leave the counter increment logic in place.) |
|
|
|
With only that change, run again (but save yourself some scrollback): |
|
|
|
---- |
|
$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10 |
|
---- |
|
|
|
Take a look at the top commit with `git show` and the object ID you printed; it |
|
should be the same as the output of `git show HEAD`. |
|
|
|
Next, let's change a setting on our `struct rev_info` within |
|
`walken_object_walk()`. Find where you're changing the other settings on `rev`, |
|
such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the |
|
`reverse` setting at the bottom: |
|
|
|
---- |
|
... |
|
|
|
rev->tree_objects = 1; |
|
rev->blob_objects = 1; |
|
rev->tag_objects = 1; |
|
rev->tree_blobs_in_commit_order = 1; |
|
rev->reverse = 1; |
|
|
|
... |
|
---- |
|
|
|
Now, run again, but this time, let's grab the last handful of objects instead |
|
of the first handful: |
|
|
|
---- |
|
$ make |
|
$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10 |
|
---- |
|
|
|
The last commit object given should have the same OID as the one we saw at the |
|
top before, and running `git show <oid>` with that OID should give you again |
|
the same results as `git show HEAD`. Furthermore, if you run and examine the |
|
first ten lines again (with `head` instead of `tail` like we did before applying |
|
the `reverse` setting), you should see that now the first commit printed is the |
|
initial commit, `e83c5163`. |
|
|
|
== Wrapping Up |
|
|
|
Let's review. In this tutorial, we: |
|
|
|
- Built a commit walk from the ground up |
|
- Enabled a grep filter for that commit walk |
|
- Changed the sort order of that filtered commit walk |
|
- Built an object walk (tags, commits, trees, and blobs) from the ground up |
|
- Learned how to add a filter-spec to an object walk |
|
- Changed the display order of the filtered object walk
|
|
|