You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
907 lines
29 KiB
907 lines
29 KiB
5 years ago
|
= My First Object Walk
|
||
|
|
||
|
== What's an Object Walk?
|
||
|
|
||
|
The object walk is a key concept in Git - this is the process that underpins
|
||
|
operations like object transfer and fsck. Beginning from a given commit, the
|
||
|
list of objects is found by walking parent relationships between commits (commit
|
||
|
X based on commit W) and containment relationships between objects (tree Y is
|
||
|
contained within commit X, and blob Z is located within tree Y, giving our
|
||
|
working tree for commit X something like `y/z.txt`).
|
||
|
|
||
|
A related concept is the revision walk, which is focused on commit objects and
|
||
|
their parent relationships and does not delve into other object types. The
|
||
|
revision walk is used for operations like `git log`.
|
||
|
|
||
|
=== Related Reading
|
||
|
|
||
|
- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
|
||
|
the revision walker in its various incarnations.
|
||
|
- `Documentation/technical/api-revision-walking.txt`
|
||
|
- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
|
||
|
gives a good overview of the types of objects in Git and what your object
|
||
|
walk is really describing.
|
||
|
|
||
|
== Setting Up
|
||
|
|
||
|
Create a new branch from `master`.
|
||
|
|
||
|
----
|
||
|
git checkout -b revwalk origin/master
|
||
|
----
|
||
|
|
||
|
We'll put our fiddling into a new command. For fun, let's name it `git walken`.
|
||
|
Open up a new file `builtin/walken.c` and set up the command handler:
|
||
|
|
||
|
----
|
||
|
/*
|
||
|
* "git walken"
|
||
|
*
|
||
|
* Part of the "My First Object Walk" tutorial.
|
||
|
*/
|
||
|
|
||
|
#include "builtin.h"
|
||
|
|
||
|
int cmd_walken(int argc, const char **argv, const char *prefix)
|
||
|
{
|
||
|
trace_printf(_("cmd_walken incoming...\n"));
|
||
|
return 0;
|
||
|
}
|
||
|
----
|
||
|
|
||
|
NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
|
||
|
off at runtime. For the purposes of this tutorial, we will write `walken` as
|
||
|
though it is intended for use as a "plumbing" command: that is, a command which
|
||
|
is used primarily in scripts, rather than interactively by humans (a "porcelain"
|
||
|
command). So we will send our debug output to `trace_printf()` instead. When
|
||
|
running, enable trace output by setting the environment variable `GIT_TRACE`.
|
||
|
|
||
|
Add usage text and `-h` handling, like all subcommands should consistently do
|
||
|
(our test suite will notice and complain if you fail to do so).
|
||
|
|
||
|
----
|
||
|
int cmd_walken(int argc, const char **argv, const char *prefix)
|
||
|
{
|
||
|
const char * const walken_usage[] = {
|
||
|
N_("git walken"),
|
||
|
NULL,
|
||
|
}
|
||
|
struct option options[] = {
|
||
|
OPT_END()
|
||
|
};
|
||
|
|
||
|
argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
|
||
|
|
||
|
...
|
||
|
}
|
||
|
----
|
||
|
|
||
|
Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
|
||
|
|
||
|
----
|
||
|
int cmd_walken(int argc, const char **argv, const char *prefix);
|
||
|
----
|
||
|
|
||
|
Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
|
||
|
maintaining alphabetical ordering:
|
||
|
|
||
|
----
|
||
|
{ "walken", cmd_walken, RUN_SETUP },
|
||
|
----
|
||
|
|
||
|
Add it to the `Makefile` near the line for `builtin/worktree.o`:
|
||
|
|
||
|
----
|
||
|
BUILTIN_OBJS += builtin/walken.o
|
||
|
----
|
||
|
|
||
|
Build and test out your command, without forgetting to ensure the `DEVELOPER`
|
||
|
flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
|
||
|
|
||
|
----
|
||
|
$ echo DEVELOPER=1 >>config.mak
|
||
|
$ make
|
||
|
$ GIT_TRACE=1 ./bin-wrappers/git walken
|
||
|
----
|
||
|
|
||
|
NOTE: For a more exhaustive overview of the new command process, take a look at
|
||
|
`Documentation/MyFirstContribution.txt`.
|
||
|
|
||
|
NOTE: A reference implementation can be found at
|
||
|
https://github.com/nasamuffin/git/tree/revwalk.
|
||
|
|
||
|
=== `struct rev_cmdline_info`
|
||
|
|
||
|
The definition of `struct rev_cmdline_info` can be found in `revision.h`.
|
||
|
|
||
|
This struct is contained within the `rev_info` struct and is used to reflect
|
||
|
parameters provided by the user over the CLI.
|
||
|
|
||
|
`nr` represents the number of `rev_cmdline_entry` present in the array.
|
||
|
|
||
|
`alloc` is used by the `ALLOC_GROW` macro. Check
|
||
|
`Documentation/technical/api-allocation-growing.txt` - this variable is used to
|
||
|
track the allocated size of the list.
|
||
|
|
||
|
Per entry, we find:
|
||
|
|
||
|
`item` is the object provided upon which to base the object walk. Items in Git
|
||
|
can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
|
||
|
|
||
|
`name` is the object ID (OID) of the object - a hex string you may be familiar
|
||
|
with from using Git to organize your source in the past. Check the tutorial
|
||
|
mentioned above towards the top for a discussion of where the OID can come
|
||
|
from.
|
||
|
|
||
|
`whence` indicates some information about what to do with the parents of the
|
||
|
specified object. We'll explore this flag more later on; take a look at
|
||
|
`Documentation/revisions.txt` to get an idea of what could set the `whence`
|
||
|
value.
|
||
|
|
||
|
`flags` are used to hint the beginning of the revision walk and are the first
|
||
|
block under the `#include`s in `revision.h`. The most likely ones to be set in
|
||
|
the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
|
||
|
can be used during the walk, as well.
|
||
|
|
||
|
=== `struct rev_info`
|
||
|
|
||
|
This one is quite a bit longer, and many fields are only used during the walk
|
||
|
by `revision.c` - not configuration options. Most of the configurable flags in
|
||
|
`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
|
||
|
good idea to take some time and read through that document.
|
||
|
|
||
|
== Basic Commit Walk
|
||
|
|
||
|
First, let's see if we can replicate the output of `git log --oneline`. We'll
|
||
|
refer back to the implementation frequently to discover norms when performing
|
||
|
an object walk of our own.
|
||
|
|
||
|
To do so, we'll first find all the commits, in order, which preceded the current
|
||
|
commit. We'll extract the name and subject of the commit from each.
|
||
|
|
||
|
Ideally, we will also be able to find out which ones are currently at the tip of
|
||
|
various branches.
|
||
|
|
||
|
=== Setting Up
|
||
|
|
||
|
Preparing for your object walk has some distinct stages.
|
||
|
|
||
|
1. Perform default setup for this mode, and others which may be invoked.
|
||
|
2. Check configuration files for relevant settings.
|
||
|
3. Set up the `rev_info` struct.
|
||
|
4. Tweak the initialized `rev_info` to suit the current walk.
|
||
|
5. Prepare the `rev_info` for the walk.
|
||
|
6. Iterate over the objects, processing each one.
|
||
|
|
||
|
==== Default Setups
|
||
|
|
||
|
Before examining configuration files which may modify command behavior, set up
|
||
|
default state for switches or options your command may have. If your command
|
||
|
utilizes other Git components, ask them to set up their default states as well.
|
||
|
For instance, `git log` takes advantage of `grep` and `diff` functionality, so
|
||
|
its `init_log_defaults()` sets its own state (`decoration_style`) and asks
|
||
|
`grep` and `diff` to initialize themselves by calling each of their
|
||
|
initialization functions.
|
||
|
|
||
|
For our first example within `git walken`, we don't intend to use any other
|
||
|
components within Git, and we don't have any configuration to do. However, we
|
||
|
may want to add some later, so for now, we can add an empty placeholder. Create
|
||
|
a new function in `builtin/walken.c`:
|
||
|
|
||
|
----
|
||
|
static void init_walken_defaults(void)
|
||
|
{
|
||
|
/*
|
||
|
* We don't actually need the same components `git log` does; leave this
|
||
|
* empty for now.
|
||
|
*/
|
||
|
}
|
||
|
----
|
||
|
|
||
|
Make sure to add a line invoking it inside of `cmd_walken()`.
|
||
|
|
||
|
----
|
||
|
int cmd_walken(int argc, const char **argv, const char *prefix)
|
||
|
{
|
||
|
init_walken_defaults();
|
||
|
}
|
||
|
----
|
||
|
|
||
|
==== Configuring From `.gitconfig`
|
||
|
|
||
|
Next, we should have a look at any relevant configuration settings (i.e.,
|
||
|
settings readable and settable from `git config`). This is done by providing a
|
||
|
callback to `git_config()`; within that callback, you can also invoke methods
|
||
|
from other components you may need that need to intercept these options. Your
|
||
|
callback will be invoked once per each configuration value which Git knows about
|
||
|
(global, local, worktree, etc.).
|
||
|
|
||
|
Similarly to the default values, we don't have anything to do here yet
|
||
|
ourselves; however, we should call `git_default_config()` if we aren't calling
|
||
|
any other existing config callbacks.
|
||
|
|
||
|
Add a new function to `builtin/walken.c`:
|
||
|
|
||
|
----
|
||
|
static int git_walken_config(const char *var, const char *value, void *cb)
|
||
|
{
|
||
|
/*
|
||
|
* For now, we don't have any custom configuration, so fall back to
|
||
|
* the default config.
|
||
|
*/
|
||
|
return git_default_config(var, value, cb);
|
||
|
}
|
||
|
----
|
||
|
|
||
|
Make sure to invoke `git_config()` with it in your `cmd_walken()`:
|
||
|
|
||
|
----
|
||
|
int cmd_walken(int argc, const char **argv, const char *prefix)
|
||
|
{
|
||
|
...
|
||
|
|
||
|
git_config(git_walken_config, NULL);
|
||
|
|
||
|
...
|
||
|
}
|
||
|
----
|
||
|
|
||
|
==== Setting Up `rev_info`
|
||
|
|
||
|
Now that we've gathered external configuration and options, it's time to
|
||
|
initialize the `rev_info` object which we will use to perform the walk. This is
|
||
|
typically done by calling `repo_init_revisions()` with the repository you intend
|
||
|
to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
|
||
|
struct.
|
||
|
|
||
|
Add the `struct rev_info` and the `repo_init_revisions()` call:
|
||
|
----
|
||
|
int cmd_walken(int argc, const char **argv, const char *prefix)
|
||
|
{
|
||
|
/* This can go wherever you like in your declarations.*/
|
||
|
struct rev_info rev;
|
||
|
...
|
||
|
|
||
|
/* This should go after the git_config() call. */
|
||
|
repo_init_revisions(the_repository, &rev, prefix);
|
||
|
|
||
|
...
|
||
|
}
|
||
|
----
|
||
|
|
||
|
==== Tweaking `rev_info` For the Walk
|
||
|
|
||
|
We're getting close, but we're still not quite ready to go. Now that `rev` is
|
||
|
initialized, we can modify it to fit our needs. This is usually done within a
|
||
|
helper for clarity, so let's add one:
|
||
|
|
||
|
----
|
||
|
static void final_rev_info_setup(struct rev_info *rev)
|
||
|
{
|
||
|
/*
|
||
|
* We want to mimic the appearance of `git log --oneline`, so let's
|
||
|
* force oneline format.
|
||
|
*/
|
||
|
get_commit_format("oneline", rev);
|
||
|
|
||
|
/* Start our object walk at HEAD. */
|
||
|
add_head_to_pending(rev);
|
||
|
}
|
||
|
----
|
||
|
|
||
|
[NOTE]
|
||
|
====
|
||
|
Instead of using the shorthand `add_head_to_pending()`, you could do
|
||
|
something like this:
|
||
|
----
|
||
|
struct setup_revision_opt opt;
|
||
|
|
||
|
memset(&opt, 0, sizeof(opt));
|
||
|
opt.def = "HEAD";
|
||
|
opt.revarg_opt = REVARG_COMMITTISH;
|
||
|
setup_revisions(argc, argv, rev, &opt);
|
||
|
----
|
||
|
Using a `setup_revision_opt` gives you finer control over your walk's starting
|
||
|
point.
|
||
|
====
|
||
|
|
||
|
Then let's invoke `final_rev_info_setup()` after the call to
|
||
|
`repo_init_revisions()`:
|
||
|
|
||
|
----
|
||
|
int cmd_walken(int argc, const char **argv, const char *prefix)
|
||
|
{
|
||
|
...
|
||
|
|
||
|
final_rev_info_setup(&rev);
|
||
|
|
||
|
...
|
||
|
}
|
||
|
----
|
||
|
|
||
|
Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
|
||
|
now, this is all we need.
|
||
|
|
||
|
==== Preparing `rev_info` For the Walk
|
||
|
|
||
|
Now that `rev` is all initialized and configured, we've got one more setup step
|
||
|
before we get rolling. We can do this in a helper, which will both prepare the
|
||
|
`rev_info` for the walk, and perform the walk itself. Let's start the helper
|
||
|
with the call to `prepare_revision_walk()`, which can return an error without
|
||
|
dying on its own:
|
||
|
|
||
|
----
|
||
|
static void walken_commit_walk(struct rev_info *rev)
|
||
|
{
|
||
|
if (prepare_revision_walk(rev))
|
||
|
die(_("revision walk setup failed"));
|
||
|
}
|
||
|
----
|
||
|
|
||
|
NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
|
||
|
`stderr` it's likely to be seen by a human, so we will localize it.
|
||
|
|
||
|
==== Performing the Walk!
|
||
|
|
||
|
Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
|
||
|
can also be used as an iterator; we move to the next item in the walk by using
|
||
|
`get_revision()` repeatedly. Add the listed variable declarations at the top and
|
||
|
the walk loop below the `prepare_revision_walk()` call within your
|
||
|
`walken_commit_walk()`:
|
||
|
|
||
|
----
|
||
|
static void walken_commit_walk(struct rev_info *rev)
|
||
|
{
|
||
|
struct commit *commit;
|
||
|
struct strbuf prettybuf = STRBUF_INIT;
|
||
|
|
||
|
...
|
||
|
|
||
|
while ((commit = get_revision(rev))) {
|
||
|
if (!commit)
|
||
|
continue;
|
||
|
|
||
|
strbuf_reset(&prettybuf);
|
||
|
pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
|
||
|
puts(prettybuf.buf);
|
||
|
}
|
||
|
strbuf_release(&prettybuf);
|
||
|
}
|
||
|
----
|
||
|
|
||
|
NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
|
||
|
command we expect to be machine-parsed, we're sending it directly to stdout.
|
||
|
|
||
|
Give it a shot.
|
||
|
|
||
|
----
|
||
|
$ make
|
||
|
$ ./bin-wrappers/git walken
|
||
|
----
|
||
|
|
||
|
You should see all of the subject lines of all the commits in
|
||
|
your tree's history, in order, ending with the initial commit, "Initial revision
|
||
|
of "git", the information manager from hell". Congratulations! You've written
|
||
|
your first revision walk. You can play with printing some additional fields
|
||
|
from each commit if you're curious; have a look at the functions available in
|
||
|
`commit.h`.
|
||
|
|
||
|
=== Adding a Filter
|
||
|
|
||
|
Next, let's try to filter the commits we see based on their author. This is
|
||
|
equivalent to running `git log --author=<pattern>`. We can add a filter by
|
||
|
modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
|
||
|
|
||
|
First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add
|
||
|
`grep_config()` to `git_walken_config()`:
|
||
|
|
||
|
----
|
||
|
static void init_walken_defaults(void)
|
||
|
{
|
||
|
init_grep_defaults(the_repository);
|
||
|
}
|
||
|
|
||
|
...
|
||
|
|
||
|
static int git_walken_config(const char *var, const char *value, void *cb)
|
||
|
{
|
||
|
grep_config(var, value, cb);
|
||
|
return git_default_config(var, value, cb);
|
||
|
}
|
||
|
----
|
||
|
|
||
|
Next, we can modify the `grep_filter`. This is done with convenience functions
|
||
|
found in `grep.h`. For fun, we're filtering to only commits from folks using a
|
||
|
`gmail.com` email address - a not-very-precise guess at who may be working on
|
||
|
Git as a hobby. Since we're checking the author, which is a specific line in the
|
||
|
header, we'll use the `append_header_grep_pattern()` helper. We can use
|
||
|
the `enum grep_header_field` to indicate which part of the commit header we want
|
||
|
to search.
|
||
|
|
||
|
In `final_rev_info_setup()`, add your filter line:
|
||
|
|
||
|
----
|
||
|
static void final_rev_info_setup(int argc, const char **argv,
|
||
|
const char *prefix, struct rev_info *rev)
|
||
|
{
|
||
|
...
|
||
|
|
||
|
append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
|
||
|
"gmail");
|
||
|
compile_grep_patterns(&rev->grep_filter);
|
||
|
|
||
|
...
|
||
|
}
|
||
|
----
|
||
|
|
||
|
`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
|
||
|
it won't work unless we compile it with `compile_grep_patterns()`.
|
||
|
|
||
|
NOTE: If you are using `setup_revisions()` (for example, if you are passing a
|
||
|
`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
|
||
|
to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
|
||
|
|
||
|
NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
|
||
|
wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
|
||
|
`enum grep_pat_token` for us.
|
||
|
|
||
|
=== Changing the Order
|
||
|
|
||
|
There are a few ways that we can change the order of the commits during a
|
||
|
revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
|
||
|
typical orderings.
|
||
|
|
||
|
`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
|
||
|
before all of its children have been shown, and we avoid mixing commits which
|
||
|
are in different lines of history. (`git help log`'s section on `--topo-order`
|
||
|
has a very nice diagram to illustrate this.)
|
||
|
|
||
|
Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
|
||
|
`REV_SORT_BY_AUTHOR_DATE`. Add the following:
|
||
|
|
||
|
----
|
||
|
static void final_rev_info_setup(int argc, const char **argv,
|
||
|
const char *prefix, struct rev_info *rev)
|
||
|
{
|
||
|
...
|
||
|
|
||
|
rev->topo_order = 1;
|
||
|
rev->sort_order = REV_SORT_BY_COMMIT_DATE;
|
||
|
|
||
|
...
|
||
|
}
|
||
|
----
|
||
|
|
||
|
Let's output this into a file so we can easily diff it with the walk sorted by
|
||
|
author date.
|
||
|
|
||
|
----
|
||
|
$ make
|
||
|
$ ./bin-wrappers/git walken > commit-date.txt
|
||
|
----
|
||
|
|
||
|
Then, let's sort by author date and run it again.
|
||
|
|
||
|
----
|
||
|
static void final_rev_info_setup(int argc, const char **argv,
|
||
|
const char *prefix, struct rev_info *rev)
|
||
|
{
|
||
|
...
|
||
|
|
||
|
rev->topo_order = 1;
|
||
|
rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
|
||
|
|
||
|
...
|
||
|
}
|
||
|
----
|
||
|
|
||
|
----
|
||
|
$ make
|
||
|
$ ./bin-wrappers/git walken > author-date.txt
|
||
|
----
|
||
|
|
||
|
Finally, compare the two. This is a little less helpful without object names or
|
||
|
dates, but hopefully we get the idea.
|
||
|
|
||
|
----
|
||
|
$ diff -u commit-date.txt author-date.txt
|
||
|
----
|
||
|
|
||
|
This display indicates that commits can be reordered after they're written, for
|
||
|
example with `git rebase`.
|
||
|
|
||
|
Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
|
||
|
Set that flag somewhere inside of `final_rev_info_setup()`:
|
||
|
|
||
|
----
|
||
|
static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
|
||
|
struct rev_info *rev)
|
||
|
{
|
||
|
...
|
||
|
|
||
|
rev->reverse = 1;
|
||
|
|
||
|
...
|
||
|
}
|
||
|
----
|
||
|
|
||
|
Run your walk again and note the difference in order. (If you remove the grep
|
||
|
pattern, you should see the last commit this call gives you as your current
|
||
|
HEAD.)
|
||
|
|
||
|
== Basic Object Walk
|
||
|
|
||
|
So far we've been walking only commits. But Git has more types of objects than
|
||
|
that! Let's see if we can walk _all_ objects, and find out some information
|
||
|
about each one.
|
||
|
|
||
|
We can base our work on an example. `git pack-objects` prepares all kinds of
|
||
|
objects for packing into a bitmap or packfile. The work we are interested in
|
||
|
resides in `builtins/pack-objects.c:get_object_list()`; examination of that
|
||
|
function shows that the all-object walk is being performed by
|
||
|
`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
|
||
|
functions reside in `list-objects.c`; examining the source shows that, despite
|
||
|
the name, these functions traverse all kinds of objects. Let's have a look at
|
||
|
the arguments to `traverse_commit_list_filtered()`, which are a superset of the
|
||
|
arguments to the unfiltered version.
|
||
|
|
||
|
- `struct list_objects_filter_options *filter_options`: This is a struct which
|
||
|
stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
|
||
|
- `struct rev_info *revs`: This is the `rev_info` used for the walk.
|
||
|
- `show_commit_fn show_commit`: A callback which will be used to handle each
|
||
|
individual commit object.
|
||
|
- `show_object_fn show_object`: A callback which will be used to handle each
|
||
|
non-commit object (so each blob, tree, or tag).
|
||
|
- `void *show_data`: A context buffer which is passed in turn to `show_commit`
|
||
|
and `show_object`.
|
||
|
- `struct oidset *omitted`: A linked-list of object IDs which the provided
|
||
|
filter caused to be omitted.
|
||
|
|
||
|
It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
|
||
|
instead of needing us to call it repeatedly ourselves. Cool! Let's add the
|
||
|
callbacks first.
|
||
|
|
||
|
For the sake of this tutorial, we'll simply keep track of how many of each kind
|
||
|
of object we find. At file scope in `builtin/walken.c` add the following
|
||
|
tracking variables:
|
||
|
|
||
|
----
|
||
|
static int commit_count;
|
||
|
static int tag_count;
|
||
|
static int blob_count;
|
||
|
static int tree_count;
|
||
|
----
|
||
|
|
||
|
Commits are handled by a different callback than other objects; let's do that
|
||
|
one first:
|
||
|
|
||
|
----
|
||
|
static void walken_show_commit(struct commit *cmt, void *buf)
|
||
|
{
|
||
|
commit_count++;
|
||
|
}
|
||
|
----
|
||
|
|
||
|
The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
|
||
|
the `buf` argument is actually the context buffer that we can provide to the
|
||
|
traversal calls - `show_data`, which we mentioned a moment ago.
|
||
|
|
||
|
Since we have the `struct commit` object, we can look at all the same parts that
|
||
|
we looked at in our earlier commit-only walk. For the sake of this tutorial,
|
||
|
though, we'll just increment the commit counter and move on.
|
||
|
|
||
|
The callback for non-commits is a little different, as we'll need to check
|
||
|
which kind of object we're dealing with:
|
||
|
|
||
|
----
|
||
|
static void walken_show_object(struct object *obj, const char *str, void *buf)
|
||
|
{
|
||
|
switch (obj->type) {
|
||
|
case OBJ_TREE:
|
||
|
tree_count++;
|
||
|
break;
|
||
|
case OBJ_BLOB:
|
||
|
blob_count++;
|
||
|
break;
|
||
|
case OBJ_TAG:
|
||
|
tag_count++;
|
||
|
break;
|
||
|
case OBJ_COMMIT:
|
||
|
BUG("unexpected commit object in walken_show_object\n");
|
||
|
default:
|
||
|
BUG("unexpected object type %s in walken_show_object\n",
|
||
|
type_name(obj->type));
|
||
|
}
|
||
|
}
|
||
|
----
|
||
|
|
||
|
Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
|
||
|
context pointer that `walken_show_commit()` receives: the `show_data` argument
|
||
|
to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
|
||
|
`str` contains the name of the object, which ends up being something like
|
||
|
`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
|
||
|
|
||
|
To help assure us that we aren't double-counting commits, we'll include some
|
||
|
complaining if a commit object is routed through our non-commit callback; we'll
|
||
|
also complain if we see an invalid object type. Since those two cases should be
|
||
|
unreachable, and would only change in the event of a semantic change to the Git
|
||
|
codebase, we complain by using `BUG()` - which is a signal to a developer that
|
||
|
the change they made caused unintended consequences, and the rest of the
|
||
|
codebase needs to be updated to understand that change. `BUG()` is not intended
|
||
|
to be seen by the public, so it is not localized.
|
||
|
|
||
|
Our main object walk implementation is substantially different from our commit
|
||
|
walk implementation, so let's make a new function to perform the object walk. We
|
||
|
can perform setup which is applicable to all objects here, too, to keep separate
|
||
|
from setup which is applicable to commit-only walks.
|
||
|
|
||
|
We'll start by enabling all types of objects in the `struct rev_info`. We'll
|
||
|
also turn on `tree_blobs_in_commit_order`, which means that we will walk a
|
||
|
commit's tree and everything it points to immediately after we find each commit,
|
||
|
as opposed to waiting for the end and walking through all trees after the commit
|
||
|
history has been discovered. With the appropriate settings configured, we are
|
||
|
ready to call `prepare_revision_walk()`.
|
||
|
|
||
|
----
|
||
|
static void walken_object_walk(struct rev_info *rev)
|
||
|
{
|
||
|
rev->tree_objects = 1;
|
||
|
rev->blob_objects = 1;
|
||
|
rev->tag_objects = 1;
|
||
|
rev->tree_blobs_in_commit_order = 1;
|
||
|
|
||
|
if (prepare_revision_walk(rev))
|
||
|
die(_("revision walk setup failed"));
|
||
|
|
||
|
commit_count = 0;
|
||
|
tag_count = 0;
|
||
|
blob_count = 0;
|
||
|
tree_count = 0;
|
||
|
----
|
||
|
|
||
|
Let's start by calling just the unfiltered walk and reporting our counts.
|
||
|
Complete your implementation of `walken_object_walk()`:
|
||
|
|
||
|
----
|
||
|
traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
|
||
|
|
||
|
printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
|
||
|
blob_count, tag_count, tree_count);
|
||
|
}
|
||
|
----
|
||
|
|
||
|
NOTE: This output is intended to be machine-parsed. Therefore, we are not
|
||
|
sending it to `trace_printf()`, and we are not localizing it - we need scripts
|
||
|
to be able to count on the formatting to be exactly the way it is shown here.
|
||
|
If we were intending this output to be read by humans, we would need to localize
|
||
|
it with `_()`.
|
||
|
|
||
|
Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
|
||
|
command line options is out of scope for this tutorial, so we'll just hardcode
|
||
|
a branch we can change at compile time. Where you call `final_rev_info_setup()`
|
||
|
and `walken_commit_walk()`, instead branch like so:
|
||
|
|
||
|
----
|
||
|
if (1) {
|
||
|
add_head_to_pending(&rev);
|
||
|
walken_object_walk(&rev);
|
||
|
} else {
|
||
|
final_rev_info_setup(argc, argv, prefix, &rev);
|
||
|
walken_commit_walk(&rev);
|
||
|
}
|
||
|
----
|
||
|
|
||
|
NOTE: For simplicity, we've avoided all the filters and sorts we applied in
|
||
|
`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
|
||
|
want, you can certainly use the filters we added before by moving
|
||
|
`final_rev_info_setup()` out of the conditional and removing the call to
|
||
|
`add_head_to_pending()`.
|
||
|
|
||
|
Now we can try to run our command! It should take noticeably longer than the
|
||
|
commit walk, but an examination of the output will give you an idea why. Your
|
||
|
output should look similar to this example, but with different counts:
|
||
|
|
||
|
----
|
||
|
Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
|
||
|
----
|
||
|
|
||
|
This makes sense. We have more trees than commits because the Git project has
|
||
|
lots of subdirectories which can change, plus at least one tree per commit. We
|
||
|
have no tags because we started on a commit (`HEAD`) and while tags can point to
|
||
|
commits, commits can't point to tags.
|
||
|
|
||
|
NOTE: You will have different counts when you run this yourself! The number of
|
||
|
objects grows along with the Git project.
|
||
|
|
||
|
=== Adding a Filter
|
||
|
|
||
|
There are a handful of filters that we can apply to the object walk laid out in
|
||
|
`Documentation/rev-list-options.txt`. These filters are typically useful for
|
||
|
operations such as creating packfiles or performing a partial clone. They are
|
||
|
defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
|
||
|
will use the "tree:1" filter, which causes the walk to omit all trees and blobs
|
||
|
which are not directly referenced by commits reachable from the commit in
|
||
|
`pending` when the walk begins. (`pending` is the list of objects which need to
|
||
|
be traversed during a walk; you can imagine a breadth-first tree traversal to
|
||
|
help understand. In our case, that means we omit trees and blobs not directly
|
||
|
referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
|
||
|
`HEAD` in the `pending` list.)
|
||
|
|
||
|
First, we'll need to `#include "list-objects-filter-options.h`" and set up the
|
||
|
`struct list_objects_filter_options` at the top of the function.
|
||
|
|
||
|
----
|
||
|
static void walken_object_walk(struct rev_info *rev)
|
||
|
{
|
||
|
struct list_objects_filter_options filter_options = {};
|
||
|
|
||
|
...
|
||
|
----
|
||
|
|
||
|
For now, we are not going to track the omitted objects, so we'll replace those
|
||
|
parameters with `NULL`. For the sake of simplicity, we'll add a simple
|
||
|
build-time branch to use our filter or not. Replace the line calling
|
||
|
`traverse_commit_list()` with the following, which will remind us which kind of
|
||
|
walk we've just performed:
|
||
|
|
||
|
----
|
||
|
if (0) {
|
||
|
/* Unfiltered: */
|
||
|
trace_printf(_("Unfiltered object walk.\n"));
|
||
|
traverse_commit_list(rev, walken_show_commit,
|
||
|
walken_show_object, NULL);
|
||
|
} else {
|
||
|
trace_printf(
|
||
|
_("Filtered object walk with filterspec 'tree:1'.\n"));
|
||
|
parse_list_objects_filter(&filter_options, "tree:1");
|
||
|
|
||
|
traverse_commit_list_filtered(&filter_options, rev,
|
||
|
walken_show_commit, walken_show_object, NULL, NULL);
|
||
|
}
|
||
|
----
|
||
|
|
||
|
`struct list_objects_filter_options` is usually built directly from a command
|
||
|
line argument, so the module provides an easy way to build one from a string.
|
||
|
Even though we aren't taking user input right now, we can still build one with
|
||
|
a hardcoded string using `parse_list_objects_filter()`.
|
||
|
|
||
|
With the filter spec "tree:1", we are expecting to see _only_ the root tree for
|
||
|
each commit; therefore, the tree object count should be less than or equal to
|
||
|
the number of commits. (For an example of why that's true: `git commit --revert`
|
||
|
points to the same tree object as its grandparent.)
|
||
|
|
||
|
=== Counting Omitted Objects
|
||
|
|
||
|
We also have the capability to enumerate all objects which were omitted by a
|
||
|
filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
|
||
|
`traverse_commit_list_filtered()` to populate the `omitted` list means that our
|
||
|
object walk does not perform any better than an unfiltered object walk; all
|
||
|
reachable objects are walked in order to populate the list.
|
||
|
|
||
|
First, add the `struct oidset` and related items we will use to iterate it:
|
||
|
|
||
|
----
|
||
|
static void walken_object_walk(
|
||
|
...
|
||
|
|
||
|
struct oidset omitted;
|
||
|
struct oidset_iter oit;
|
||
|
struct object_id *oid = NULL;
|
||
|
int omitted_count = 0;
|
||
|
oidset_init(&omitted, 0);
|
||
|
|
||
|
...
|
||
|
----
|
||
|
|
||
|
Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
|
||
|
object:
|
||
|
|
||
|
----
|
||
|
...
|
||
|
|
||
|
traverse_commit_list_filtered(&filter_options, rev,
|
||
|
walken_show_commit, walken_show_object, NULL, &omitted);
|
||
|
|
||
|
...
|
||
|
----
|
||
|
|
||
|
Then, after your traversal, the `oidset` traversal is pretty straightforward.
|
||
|
Count all the objects within and modify the print statement:
|
||
|
|
||
|
----
|
||
|
/* Count the omitted objects. */
|
||
|
oidset_iter_init(&omitted, &oit);
|
||
|
|
||
|
while ((oid = oidset_iter_next(&oit)))
|
||
|
omitted_count++;
|
||
|
|
||
|
printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n",
|
||
|
commit_count, blob_count, tag_count, tree_count, omitted_count);
|
||
|
----
|
||
|
|
||
|
By running your walk with and without the filter, you should find that the total
|
||
|
object count in each case is identical. You can also time each invocation of
|
||
|
the `walken` subcommand, with and without `omitted` being passed in, to confirm
|
||
|
to yourself the runtime impact of tracking all omitted objects.
|
||
|
|
||
|
=== Changing the Order
|
||
|
|
||
|
Finally, let's demonstrate that you can also reorder walks of all objects, not
|
||
|
just walks of commits. First, we'll make our handlers chattier - modify
|
||
|
`walken_show_commit()` and `walken_show_object()` to print the object as they
|
||
|
go:
|
||
|
|
||
|
----
|
||
|
static void walken_show_commit(struct commit *cmt, void *buf)
|
||
|
{
|
||
|
trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
|
||
|
commit_count++;
|
||
|
}
|
||
|
|
||
|
static void walken_show_object(struct object *obj, const char *str, void *buf)
|
||
|
{
|
||
|
trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
|
||
|
|
||
|
...
|
||
|
}
|
||
|
----
|
||
|
|
||
|
NOTE: Since we will be examining this output directly as humans, we'll use
|
||
|
`trace_printf()` here. Additionally, since this change introduces a significant
|
||
|
number of printed lines, using `trace_printf()` will allow us to easily silence
|
||
|
those lines without having to recompile.
|
||
|
|
||
|
(Leave the counter increment logic in place.)
|
||
|
|
||
|
With only that change, run again (but save yourself some scrollback):
|
||
|
|
||
|
----
|
||
|
$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
|
||
|
----
|
||
|
|
||
|
Take a look at the top commit with `git show` and the object ID you printed; it
|
||
|
should be the same as the output of `git show HEAD`.
|
||
|
|
||
|
Next, let's change a setting on our `struct rev_info` within
|
||
|
`walken_object_walk()`. Find where you're changing the other settings on `rev`,
|
||
|
such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
|
||
|
`reverse` setting at the bottom:
|
||
|
|
||
|
----
|
||
|
...
|
||
|
|
||
|
rev->tree_objects = 1;
|
||
|
rev->blob_objects = 1;
|
||
|
rev->tag_objects = 1;
|
||
|
rev->tree_blobs_in_commit_order = 1;
|
||
|
rev->reverse = 1;
|
||
|
|
||
|
...
|
||
|
----
|
||
|
|
||
|
Now, run again, but this time, let's grab the last handful of objects instead
|
||
|
of the first handful:
|
||
|
|
||
|
----
|
||
|
$ make
|
||
|
$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
|
||
|
----
|
||
|
|
||
|
The last commit object given should have the same OID as the one we saw at the
|
||
|
top before, and running `git show <oid>` with that OID should give you again
|
||
|
the same results as `git show HEAD`. Furthermore, if you run and examine the
|
||
|
first ten lines again (with `head` instead of `tail` like we did before applying
|
||
|
the `reverse` setting), you should see that now the first commit printed is the
|
||
|
initial commit, `e83c5163`.
|
||
|
|
||
|
== Wrapping Up
|
||
|
|
||
|
Let's review. In this tutorial, we:
|
||
|
|
||
|
- Built a commit walk from the ground up
|
||
|
- Enabled a grep filter for that commit walk
|
||
|
- Changed the sort order of that filtered commit walk
|
||
|
- Built an object walk (tags, commits, trees, and blobs) from the ground up
|
||
|
- Learned how to add a filter-spec to an object walk
|
||
|
- Changed the display order of the filtered object walk
|