Browse Source
Precompute and store information necessary for ancestry traversal in a separate file to optimize graph walking. * ds/commit-graph: commit-graph: implement "--append" option commit-graph: build graph from starting commits commit-graph: read only from specific pack-indexes commit: integrate commit graph with commit parsing commit-graph: close under reachability commit-graph: add core.commitGraph setting commit-graph: implement git commit-graph read commit-graph: implement git-commit-graph write commit-graph: implement write_commit_graph() commit-graph: create git-commit-graph builtin graph: add commit graph design document commit-graph: add format document csum-file: refactor finalize_hashfile() method csum-file: rename hashclose() to finalize_hashfile()maint
Junio C Hamano
7 years ago
30 changed files with 1587 additions and 21 deletions
@ -0,0 +1,94 @@
@@ -0,0 +1,94 @@
|
||||
git-commit-graph(1) |
||||
=================== |
||||
|
||||
NAME |
||||
---- |
||||
git-commit-graph - Write and verify Git commit graph files |
||||
|
||||
|
||||
SYNOPSIS |
||||
-------- |
||||
[verse] |
||||
'git commit-graph read' [--object-dir <dir>] |
||||
'git commit-graph write' <options> [--object-dir <dir>] |
||||
|
||||
|
||||
DESCRIPTION |
||||
----------- |
||||
|
||||
Manage the serialized commit graph file. |
||||
|
||||
|
||||
OPTIONS |
||||
------- |
||||
--object-dir:: |
||||
Use given directory for the location of packfiles and commit graph |
||||
file. This parameter exists to specify the location of an alternate |
||||
that only has the objects directory, not a full .git directory. The |
||||
commit graph file is expected to be at <dir>/info/commit-graph and |
||||
the packfiles are expected to be in <dir>/pack. |
||||
|
||||
|
||||
COMMANDS |
||||
-------- |
||||
'write':: |
||||
|
||||
Write a commit graph file based on the commits found in packfiles. |
||||
+ |
||||
With the `--stdin-packs` option, generate the new commit graph by |
||||
walking objects only in the specified pack-indexes. (Cannot be combined |
||||
with --stdin-commits.) |
||||
+ |
||||
With the `--stdin-commits` option, generate the new commit graph by |
||||
walking commits starting at the commits specified in stdin as a list |
||||
of OIDs in hex, one OID per line. (Cannot be combined with |
||||
--stdin-packs.) |
||||
+ |
||||
With the `--append` option, include all commits that are present in the |
||||
existing commit-graph file. |
||||
|
||||
'read':: |
||||
|
||||
Read a graph file given by the commit-graph file and output basic |
||||
details about the graph file. Used for debugging purposes. |
||||
|
||||
|
||||
EXAMPLES |
||||
-------- |
||||
|
||||
* Write a commit graph file for the packed commits in your local .git folder. |
||||
+ |
||||
------------------------------------------------ |
||||
$ git commit-graph write |
||||
------------------------------------------------ |
||||
|
||||
* Write a graph file, extending the current graph file using commits |
||||
* in <pack-index>. |
||||
+ |
||||
------------------------------------------------ |
||||
$ echo <pack-index> | git commit-graph write --stdin-packs |
||||
------------------------------------------------ |
||||
|
||||
* Write a graph file containing all reachable commits. |
||||
+ |
||||
------------------------------------------------ |
||||
$ git show-ref -s | git commit-graph write --stdin-commits |
||||
------------------------------------------------ |
||||
|
||||
* Write a graph file containing all commits in the current |
||||
* commit-graph file along with those reachable from HEAD. |
||||
+ |
||||
------------------------------------------------ |
||||
$ git rev-parse HEAD | git commit-graph write --stdin-commits --append |
||||
------------------------------------------------ |
||||
|
||||
* Read basic information from the commit-graph file. |
||||
+ |
||||
------------------------------------------------ |
||||
$ git commit-graph read |
||||
------------------------------------------------ |
||||
|
||||
|
||||
GIT |
||||
--- |
||||
Part of the linkgit:git[1] suite |
@ -0,0 +1,97 @@
@@ -0,0 +1,97 @@
|
||||
Git commit graph format |
||||
======================= |
||||
|
||||
The Git commit graph stores a list of commit OIDs and some associated |
||||
metadata, including: |
||||
|
||||
- The generation number of the commit. Commits with no parents have |
||||
generation number 1; commits with parents have generation number |
||||
one more than the maximum generation number of its parents. We |
||||
reserve zero as special, and can be used to mark a generation |
||||
number invalid or as "not computed". |
||||
|
||||
- The root tree OID. |
||||
|
||||
- The commit date. |
||||
|
||||
- The parents of the commit, stored using positional references within |
||||
the graph file. |
||||
|
||||
These positional references are stored as unsigned 32-bit integers |
||||
corresponding to the array position withing the list of commit OIDs. We |
||||
use the most-significant bit for special purposes, so we can store at most |
||||
(1 << 31) - 1 (around 2 billion) commits. |
||||
|
||||
== Commit graph files have the following format: |
||||
|
||||
In order to allow extensions that add extra data to the graph, we organize |
||||
the body into "chunks" and provide a binary lookup table at the beginning |
||||
of the body. The header includes certain values, such as number of chunks |
||||
and hash type. |
||||
|
||||
All 4-byte numbers are in network order. |
||||
|
||||
HEADER: |
||||
|
||||
4-byte signature: |
||||
The signature is: {'C', 'G', 'P', 'H'} |
||||
|
||||
1-byte version number: |
||||
Currently, the only valid version is 1. |
||||
|
||||
1-byte Hash Version (1 = SHA-1) |
||||
We infer the hash length (H) from this value. |
||||
|
||||
1-byte number (C) of "chunks" |
||||
|
||||
1-byte (reserved for later use) |
||||
Current clients should ignore this value. |
||||
|
||||
CHUNK LOOKUP: |
||||
|
||||
(C + 1) * 12 bytes listing the table of contents for the chunks: |
||||
First 4 bytes describe the chunk id. Value 0 is a terminating label. |
||||
Other 8 bytes provide the byte-offset in current file for chunk to |
||||
start. (Chunks are ordered contiguously in the file, so you can infer |
||||
the length using the next chunk position if necessary.) Each chunk |
||||
ID appears at most once. |
||||
|
||||
The remaining data in the body is described one chunk at a time, and |
||||
these chunks may be given in any order. Chunks are required unless |
||||
otherwise specified. |
||||
|
||||
CHUNK DATA: |
||||
|
||||
OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes) |
||||
The ith entry, F[i], stores the number of OIDs with first |
||||
byte at most i. Thus F[255] stores the total |
||||
number of commits (N). |
||||
|
||||
OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes) |
||||
The OIDs for all commits in the graph, sorted in ascending order. |
||||
|
||||
Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes) |
||||
* The first H bytes are for the OID of the root tree. |
||||
* The next 8 bytes are for the positions of the first two parents |
||||
of the ith commit. Stores value 0xffffffff if no parent in that |
||||
position. If there are more than two parents, the second value |
||||
has its most-significant bit on and the other bits store an array |
||||
position into the Large Edge List chunk. |
||||
* The next 8 bytes store the generation number of the commit and |
||||
the commit time in seconds since EPOCH. The generation number |
||||
uses the higher 30 bits of the first 4 bytes, while the commit |
||||
time uses the 32 bits of the second 4 bytes, along with the lowest |
||||
2 bits of the lowest byte, storing the 33rd and 34th bit of the |
||||
commit time. |
||||
|
||||
Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional] |
||||
This list of 4-byte values store the second through nth parents for |
||||
all octopus merges. The second parent value in the commit data stores |
||||
an array position within this list along with the most-significant bit |
||||
on. Starting at that array position, iterate through this list of commit |
||||
positions for the parents until reaching a value with the most-significant |
||||
bit on. The other bits correspond to the position of the last parent. |
||||
|
||||
TRAILER: |
||||
|
||||
H-byte HASH-checksum of all of the above. |
@ -0,0 +1,163 @@
@@ -0,0 +1,163 @@
|
||||
Git Commit Graph Design Notes |
||||
============================= |
||||
|
||||
Git walks the commit graph for many reasons, including: |
||||
|
||||
1. Listing and filtering commit history. |
||||
2. Computing merge bases. |
||||
|
||||
These operations can become slow as the commit count grows. The merge |
||||
base calculation shows up in many user-facing commands, such as 'merge-base' |
||||
or 'status' and can take minutes to compute depending on history shape. |
||||
|
||||
There are two main costs here: |
||||
|
||||
1. Decompressing and parsing commits. |
||||
2. Walking the entire graph to satisfy topological order constraints. |
||||
|
||||
The commit graph file is a supplemental data structure that accelerates |
||||
commit graph walks. If a user downgrades or disables the 'core.commitGraph' |
||||
config setting, then the existing ODB is sufficient. The file is stored |
||||
as "commit-graph" either in the .git/objects/info directory or in the info |
||||
directory of an alternate. |
||||
|
||||
The commit graph file stores the commit graph structure along with some |
||||
extra metadata to speed up graph walks. By listing commit OIDs in lexi- |
||||
cographic order, we can identify an integer position for each commit and |
||||
refer to the parents of a commit using those integer positions. We use |
||||
binary search to find initial commits and then use the integer positions |
||||
for fast lookups during the walk. |
||||
|
||||
A consumer may load the following info for a commit from the graph: |
||||
|
||||
1. The commit OID. |
||||
2. The list of parents, along with their integer position. |
||||
3. The commit date. |
||||
4. The root tree OID. |
||||
5. The generation number (see definition below). |
||||
|
||||
Values 1-4 satisfy the requirements of parse_commit_gently(). |
||||
|
||||
Define the "generation number" of a commit recursively as follows: |
||||
|
||||
* A commit with no parents (a root commit) has generation number one. |
||||
|
||||
* A commit with at least one parent has generation number one more than |
||||
the largest generation number among its parents. |
||||
|
||||
Equivalently, the generation number of a commit A is one more than the |
||||
length of a longest path from A to a root commit. The recursive definition |
||||
is easier to use for computation and observing the following property: |
||||
|
||||
If A and B are commits with generation numbers N and M, respectively, |
||||
and N <= M, then A cannot reach B. That is, we know without searching |
||||
that B is not an ancestor of A because it is further from a root commit |
||||
than A. |
||||
|
||||
Conversely, when checking if A is an ancestor of B, then we only need |
||||
to walk commits until all commits on the walk boundary have generation |
||||
number at most N. If we walk commits using a priority queue seeded by |
||||
generation numbers, then we always expand the boundary commit with highest |
||||
generation number and can easily detect the stopping condition. |
||||
|
||||
This property can be used to significantly reduce the time it takes to |
||||
walk commits and determine topological relationships. Without generation |
||||
numbers, the general heuristic is the following: |
||||
|
||||
If A and B are commits with commit time X and Y, respectively, and |
||||
X < Y, then A _probably_ cannot reach B. |
||||
|
||||
This heuristic is currently used whenever the computation is allowed to |
||||
violate topological relationships due to clock skew (such as "git log" |
||||
with default order), but is not used when the topological order is |
||||
required (such as merge base calculations, "git log --graph"). |
||||
|
||||
In practice, we expect some commits to be created recently and not stored |
||||
in the commit graph. We can treat these commits as having "infinite" |
||||
generation number and walk until reaching commits with known generation |
||||
number. |
||||
|
||||
Design Details |
||||
-------------- |
||||
|
||||
- The commit graph file is stored in a file named 'commit-graph' in the |
||||
.git/objects/info directory. This could be stored in the info directory |
||||
of an alternate. |
||||
|
||||
- The core.commitGraph config setting must be on to consume graph files. |
||||
|
||||
- The file format includes parameters for the object ID hash function, |
||||
so a future change of hash algorithm does not require a change in format. |
||||
|
||||
Future Work |
||||
----------- |
||||
|
||||
- The commit graph feature currently does not honor commit grafts. This can |
||||
be remedied by duplicating or refactoring the current graft logic. |
||||
|
||||
- The 'commit-graph' subcommand does not have a "verify" mode that is |
||||
necessary for integration with fsck. |
||||
|
||||
- The file format includes room for precomputed generation numbers. These |
||||
are not currently computed, so all generation numbers will be marked as |
||||
0 (or "uncomputed"). A later patch will include this calculation. |
||||
|
||||
- After computing and storing generation numbers, we must make graph |
||||
walks aware of generation numbers to gain the performance benefits they |
||||
enable. This will mostly be accomplished by swapping a commit-date-ordered |
||||
priority queue with one ordered by generation number. The following |
||||
operations are important candidates: |
||||
|
||||
- paint_down_to_common() |
||||
- 'log --topo-order' |
||||
|
||||
- Currently, parse_commit_gently() requires filling in the root tree |
||||
object for a commit. This passes through lookup_tree() and consequently |
||||
lookup_object(). Also, it calls lookup_commit() when loading the parents. |
||||
These method calls check the ODB for object existence, even if the |
||||
consumer does not need the content. For example, we do not need the |
||||
tree contents when computing merge bases. Now that commit parsing is |
||||
removed from the computation time, these lookup operations are the |
||||
slowest operations keeping graph walks from being fast. Consider |
||||
loading these objects without verifying their existence in the ODB and |
||||
only loading them fully when consumers need them. Consider a method |
||||
such as "ensure_tree_loaded(commit)" that fully loads a tree before |
||||
using commit->tree. |
||||
|
||||
- The current design uses the 'commit-graph' subcommand to generate the graph. |
||||
When this feature stabilizes enough to recommend to most users, we should |
||||
add automatic graph writes to common operations that create many commits. |
||||
For example, one could compute a graph on 'clone', 'fetch', or 'repack' |
||||
commands. |
||||
|
||||
- A server could provide a commit graph file as part of the network protocol |
||||
to avoid extra calculations by clients. This feature is only of benefit if |
||||
the user is willing to trust the file, because verifying the file is correct |
||||
is as hard as computing it from scratch. |
||||
|
||||
Related Links |
||||
------------- |
||||
[0] https://bugs.chromium.org/p/git/issues/detail?id=8 |
||||
Chromium work item for: Serialized Commit Graph |
||||
|
||||
[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/ |
||||
An abandoned patch that introduced generation numbers. |
||||
|
||||
[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/ |
||||
Discussion about generation numbers on commits and how they interact |
||||
with fsck. |
||||
|
||||
[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/ |
||||
More discussion about generation numbers and not storing them inside |
||||
commit objects. A valuable quote: |
||||
|
||||
"I think we should be moving more in the direction of keeping |
||||
repo-local caches for optimizations. Reachability bitmaps have been |
||||
a big performance win. I think we should be doing the same with our |
||||
properties of commits. Not just generation numbers, but making it |
||||
cheap to access the graph structure without zlib-inflating whole |
||||
commit objects (i.e., packv4 or something like the "metapacks" I |
||||
proposed a few years ago)." |
||||
|
||||
[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u |
||||
A patch to remove the ahead-behind calculation from 'status'. |
@ -0,0 +1,171 @@
@@ -0,0 +1,171 @@
|
||||
#include "builtin.h" |
||||
#include "config.h" |
||||
#include "dir.h" |
||||
#include "lockfile.h" |
||||
#include "parse-options.h" |
||||
#include "commit-graph.h" |
||||
|
||||
static char const * const builtin_commit_graph_usage[] = { |
||||
N_("git commit-graph [--object-dir <objdir>]"), |
||||
N_("git commit-graph read [--object-dir <objdir>]"), |
||||
N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"), |
||||
NULL |
||||
}; |
||||
|
||||
static const char * const builtin_commit_graph_read_usage[] = { |
||||
N_("git commit-graph read [--object-dir <objdir>]"), |
||||
NULL |
||||
}; |
||||
|
||||
static const char * const builtin_commit_graph_write_usage[] = { |
||||
N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"), |
||||
NULL |
||||
}; |
||||
|
||||
static struct opts_commit_graph { |
||||
const char *obj_dir; |
||||
int stdin_packs; |
||||
int stdin_commits; |
||||
int append; |
||||
} opts; |
||||
|
||||
static int graph_read(int argc, const char **argv) |
||||
{ |
||||
struct commit_graph *graph = NULL; |
||||
char *graph_name; |
||||
|
||||
static struct option builtin_commit_graph_read_options[] = { |
||||
OPT_STRING(0, "object-dir", &opts.obj_dir, |
||||
N_("dir"), |
||||
N_("The object directory to store the graph")), |
||||
OPT_END(), |
||||
}; |
||||
|
||||
argc = parse_options(argc, argv, NULL, |
||||
builtin_commit_graph_read_options, |
||||
builtin_commit_graph_read_usage, 0); |
||||
|
||||
if (!opts.obj_dir) |
||||
opts.obj_dir = get_object_directory(); |
||||
|
||||
graph_name = get_commit_graph_filename(opts.obj_dir); |
||||
graph = load_commit_graph_one(graph_name); |
||||
|
||||
if (!graph) |
||||
die("graph file %s does not exist", graph_name); |
||||
FREE_AND_NULL(graph_name); |
||||
|
||||
printf("header: %08x %d %d %d %d\n", |
||||
ntohl(*(uint32_t*)graph->data), |
||||
*(unsigned char*)(graph->data + 4), |
||||
*(unsigned char*)(graph->data + 5), |
||||
*(unsigned char*)(graph->data + 6), |
||||
*(unsigned char*)(graph->data + 7)); |
||||
printf("num_commits: %u\n", graph->num_commits); |
||||
printf("chunks:"); |
||||
|
||||
if (graph->chunk_oid_fanout) |
||||
printf(" oid_fanout"); |
||||
if (graph->chunk_oid_lookup) |
||||
printf(" oid_lookup"); |
||||
if (graph->chunk_commit_data) |
||||
printf(" commit_metadata"); |
||||
if (graph->chunk_large_edges) |
||||
printf(" large_edges"); |
||||
printf("\n"); |
||||
|
||||
return 0; |
||||
} |
||||
|
||||
static int graph_write(int argc, const char **argv) |
||||
{ |
||||
const char **pack_indexes = NULL; |
||||
int packs_nr = 0; |
||||
const char **commit_hex = NULL; |
||||
int commits_nr = 0; |
||||
const char **lines = NULL; |
||||
int lines_nr = 0; |
||||
int lines_alloc = 0; |
||||
|
||||
static struct option builtin_commit_graph_write_options[] = { |
||||
OPT_STRING(0, "object-dir", &opts.obj_dir, |
||||
N_("dir"), |
||||
N_("The object directory to store the graph")), |
||||
OPT_BOOL(0, "stdin-packs", &opts.stdin_packs, |
||||
N_("scan pack-indexes listed by stdin for commits")), |
||||
OPT_BOOL(0, "stdin-commits", &opts.stdin_commits, |
||||
N_("start walk at commits listed by stdin")), |
||||
OPT_BOOL(0, "append", &opts.append, |
||||
N_("include all commits already in the commit-graph file")), |
||||
OPT_END(), |
||||
}; |
||||
|
||||
argc = parse_options(argc, argv, NULL, |
||||
builtin_commit_graph_write_options, |
||||
builtin_commit_graph_write_usage, 0); |
||||
|
||||
if (opts.stdin_packs && opts.stdin_commits) |
||||
die(_("cannot use both --stdin-commits and --stdin-packs")); |
||||
if (!opts.obj_dir) |
||||
opts.obj_dir = get_object_directory(); |
||||
|
||||
if (opts.stdin_packs || opts.stdin_commits) { |
||||
struct strbuf buf = STRBUF_INIT; |
||||
lines_nr = 0; |
||||
lines_alloc = 128; |
||||
ALLOC_ARRAY(lines, lines_alloc); |
||||
|
||||
while (strbuf_getline(&buf, stdin) != EOF) { |
||||
ALLOC_GROW(lines, lines_nr + 1, lines_alloc); |
||||
lines[lines_nr++] = strbuf_detach(&buf, NULL); |
||||
} |
||||
|
||||
if (opts.stdin_packs) { |
||||
pack_indexes = lines; |
||||
packs_nr = lines_nr; |
||||
} |
||||
if (opts.stdin_commits) { |
||||
commit_hex = lines; |
||||
commits_nr = lines_nr; |
||||
} |
||||
} |
||||
|
||||
write_commit_graph(opts.obj_dir, |
||||
pack_indexes, |
||||
packs_nr, |
||||
commit_hex, |
||||
commits_nr, |
||||
opts.append); |
||||
|
||||
return 0; |
||||
} |
||||
|
||||
int cmd_commit_graph(int argc, const char **argv, const char *prefix) |
||||
{ |
||||
static struct option builtin_commit_graph_options[] = { |
||||
OPT_STRING(0, "object-dir", &opts.obj_dir, |
||||
N_("dir"), |
||||
N_("The object directory to store the graph")), |
||||
OPT_END(), |
||||
}; |
||||
|
||||
if (argc == 2 && !strcmp(argv[1], "-h")) |
||||
usage_with_options(builtin_commit_graph_usage, |
||||
builtin_commit_graph_options); |
||||
|
||||
git_config(git_default_config, NULL); |
||||
argc = parse_options(argc, argv, prefix, |
||||
builtin_commit_graph_options, |
||||
builtin_commit_graph_usage, |
||||
PARSE_OPT_STOP_AT_NON_OPTION); |
||||
|
||||
if (argc > 0) { |
||||
if (!strcmp(argv[0], "read")) |
||||
return graph_read(argc, argv); |
||||
if (!strcmp(argv[0], "write")) |
||||
return graph_write(argc, argv); |
||||
} |
||||
|
||||
usage_with_options(builtin_commit_graph_usage, |
||||
builtin_commit_graph_options); |
||||
} |
@ -0,0 +1,741 @@
@@ -0,0 +1,741 @@
|
||||
#include "cache.h" |
||||
#include "config.h" |
||||
#include "git-compat-util.h" |
||||
#include "lockfile.h" |
||||
#include "pack.h" |
||||
#include "packfile.h" |
||||
#include "commit.h" |
||||
#include "object.h" |
||||
#include "revision.h" |
||||
#include "sha1-lookup.h" |
||||
#include "commit-graph.h" |
||||
#include "object-store.h" |
||||
|
||||
#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */ |
||||
#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ |
||||
#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ |
||||
#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */ |
||||
#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */ |
||||
|
||||
#define GRAPH_DATA_WIDTH 36 |
||||
|
||||
#define GRAPH_VERSION_1 0x1 |
||||
#define GRAPH_VERSION GRAPH_VERSION_1 |
||||
|
||||
#define GRAPH_OID_VERSION_SHA1 1 |
||||
#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ |
||||
#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1 |
||||
#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1 |
||||
|
||||
#define GRAPH_OCTOPUS_EDGES_NEEDED 0x80000000 |
||||
#define GRAPH_PARENT_MISSING 0x7fffffff |
||||
#define GRAPH_EDGE_LAST_MASK 0x7fffffff |
||||
#define GRAPH_PARENT_NONE 0x70000000 |
||||
|
||||
#define GRAPH_LAST_EDGE 0x80000000 |
||||
|
||||
#define GRAPH_FANOUT_SIZE (4 * 256) |
||||
#define GRAPH_CHUNKLOOKUP_WIDTH 12 |
||||
#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \ |
||||
GRAPH_OID_LEN + 8) |
||||
|
||||
char *get_commit_graph_filename(const char *obj_dir) |
||||
{ |
||||
return xstrfmt("%s/info/commit-graph", obj_dir); |
||||
} |
||||
|
||||
static struct commit_graph *alloc_commit_graph(void) |
||||
{ |
||||
struct commit_graph *g = xcalloc(1, sizeof(*g)); |
||||
g->graph_fd = -1; |
||||
|
||||
return g; |
||||
} |
||||
|
||||
struct commit_graph *load_commit_graph_one(const char *graph_file) |
||||
{ |
||||
void *graph_map; |
||||
const unsigned char *data, *chunk_lookup; |
||||
size_t graph_size; |
||||
struct stat st; |
||||
uint32_t i; |
||||
struct commit_graph *graph; |
||||
int fd = git_open(graph_file); |
||||
uint64_t last_chunk_offset; |
||||
uint32_t last_chunk_id; |
||||
uint32_t graph_signature; |
||||
unsigned char graph_version, hash_version; |
||||
|
||||
if (fd < 0) |
||||
return NULL; |
||||
if (fstat(fd, &st)) { |
||||
close(fd); |
||||
return NULL; |
||||
} |
||||
graph_size = xsize_t(st.st_size); |
||||
|
||||
if (graph_size < GRAPH_MIN_SIZE) { |
||||
close(fd); |
||||
die("graph file %s is too small", graph_file); |
||||
} |
||||
graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); |
||||
data = (const unsigned char *)graph_map; |
||||
|
||||
graph_signature = get_be32(data); |
||||
if (graph_signature != GRAPH_SIGNATURE) { |
||||
error("graph signature %X does not match signature %X", |
||||
graph_signature, GRAPH_SIGNATURE); |
||||
goto cleanup_fail; |
||||
} |
||||
|
||||
graph_version = *(unsigned char*)(data + 4); |
||||
if (graph_version != GRAPH_VERSION) { |
||||
error("graph version %X does not match version %X", |
||||
graph_version, GRAPH_VERSION); |
||||
goto cleanup_fail; |
||||
} |
||||
|
||||
hash_version = *(unsigned char*)(data + 5); |
||||
if (hash_version != GRAPH_OID_VERSION) { |
||||
error("hash version %X does not match version %X", |
||||
hash_version, GRAPH_OID_VERSION); |
||||
goto cleanup_fail; |
||||
} |
||||
|
||||
graph = alloc_commit_graph(); |
||||
|
||||
graph->hash_len = GRAPH_OID_LEN; |
||||
graph->num_chunks = *(unsigned char*)(data + 6); |
||||
graph->graph_fd = fd; |
||||
graph->data = graph_map; |
||||
graph->data_len = graph_size; |
||||
|
||||
last_chunk_id = 0; |
||||
last_chunk_offset = 8; |
||||
chunk_lookup = data + 8; |
||||
for (i = 0; i < graph->num_chunks; i++) { |
||||
uint32_t chunk_id = get_be32(chunk_lookup + 0); |
||||
uint64_t chunk_offset = get_be64(chunk_lookup + 4); |
||||
int chunk_repeated = 0; |
||||
|
||||
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH; |
||||
|
||||
if (chunk_offset > graph_size - GIT_MAX_RAWSZ) { |
||||
error("improper chunk offset %08x%08x", (uint32_t)(chunk_offset >> 32), |
||||
(uint32_t)chunk_offset); |
||||
goto cleanup_fail; |
||||
} |
||||
|
||||
switch (chunk_id) { |
||||
case GRAPH_CHUNKID_OIDFANOUT: |
||||
if (graph->chunk_oid_fanout) |
||||
chunk_repeated = 1; |
||||
else |
||||
graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset); |
||||
break; |
||||
|
||||
case GRAPH_CHUNKID_OIDLOOKUP: |
||||
if (graph->chunk_oid_lookup) |
||||
chunk_repeated = 1; |
||||
else |
||||
graph->chunk_oid_lookup = data + chunk_offset; |
||||
break; |
||||
|
||||
case GRAPH_CHUNKID_DATA: |
||||
if (graph->chunk_commit_data) |
||||
chunk_repeated = 1; |
||||
else |
||||
graph->chunk_commit_data = data + chunk_offset; |
||||
break; |
||||
|
||||
case GRAPH_CHUNKID_LARGEEDGES: |
||||
if (graph->chunk_large_edges) |
||||
chunk_repeated = 1; |
||||
else |
||||
graph->chunk_large_edges = data + chunk_offset; |
||||
break; |
||||
} |
||||
|
||||
if (chunk_repeated) { |
||||
error("chunk id %08x appears multiple times", chunk_id); |
||||
goto cleanup_fail; |
||||
} |
||||
|
||||
if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP) |
||||
{ |
||||
graph->num_commits = (chunk_offset - last_chunk_offset) |
||||
/ graph->hash_len; |
||||
} |
||||
|
||||
last_chunk_id = chunk_id; |
||||
last_chunk_offset = chunk_offset; |
||||
} |
||||
|
||||
return graph; |
||||
|
||||
cleanup_fail: |
||||
munmap(graph_map, graph_size); |
||||
close(fd); |
||||
exit(1); |
||||
} |
||||
|
||||
/* global storage */ |
||||
static struct commit_graph *commit_graph = NULL; |
||||
|
||||
static void prepare_commit_graph_one(const char *obj_dir) |
||||
{ |
||||
char *graph_name; |
||||
|
||||
if (commit_graph) |
||||
return; |
||||
|
||||
graph_name = get_commit_graph_filename(obj_dir); |
||||
commit_graph = load_commit_graph_one(graph_name); |
||||
|
||||
FREE_AND_NULL(graph_name); |
||||
} |
||||
|
||||
static int prepare_commit_graph_run_once = 0; |
||||
static void prepare_commit_graph(void) |
||||
{ |
||||
struct alternate_object_database *alt; |
||||
char *obj_dir; |
||||
|
||||
if (prepare_commit_graph_run_once) |
||||
return; |
||||
prepare_commit_graph_run_once = 1; |
||||
|
||||
obj_dir = get_object_directory(); |
||||
prepare_commit_graph_one(obj_dir); |
||||
prepare_alt_odb(the_repository); |
||||
for (alt = the_repository->objects->alt_odb_list; |
||||
!commit_graph && alt; |
||||
alt = alt->next) |
||||
prepare_commit_graph_one(alt->path); |
||||
} |
||||
|
||||
static void close_commit_graph(void) |
||||
{ |
||||
if (!commit_graph) |
||||
return; |
||||
|
||||
if (commit_graph->graph_fd >= 0) { |
||||
munmap((void *)commit_graph->data, commit_graph->data_len); |
||||
commit_graph->data = NULL; |
||||
close(commit_graph->graph_fd); |
||||
} |
||||
|
||||
FREE_AND_NULL(commit_graph); |
||||
} |
||||
|
||||
static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t *pos) |
||||
{ |
||||
return bsearch_hash(oid->hash, g->chunk_oid_fanout, |
||||
g->chunk_oid_lookup, g->hash_len, pos); |
||||
} |
||||
|
||||
static struct commit_list **insert_parent_or_die(struct commit_graph *g, |
||||
uint64_t pos, |
||||
struct commit_list **pptr) |
||||
{ |
||||
struct commit *c; |
||||
struct object_id oid; |
||||
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos); |
||||
c = lookup_commit(&oid); |
||||
if (!c) |
||||
die("could not find commit %s", oid_to_hex(&oid)); |
||||
c->graph_pos = pos; |
||||
return &commit_list_insert(c, pptr)->next; |
||||
} |
||||
|
||||
static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t pos) |
||||
{ |
||||
struct object_id oid; |
||||
uint32_t edge_value; |
||||
uint32_t *parent_data_ptr; |
||||
uint64_t date_low, date_high; |
||||
struct commit_list **pptr; |
||||
const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * pos; |
||||
|
||||
item->object.parsed = 1; |
||||
item->graph_pos = pos; |
||||
|
||||
hashcpy(oid.hash, commit_data); |
||||
item->tree = lookup_tree(&oid); |
||||
|
||||
date_high = get_be32(commit_data + g->hash_len + 8) & 0x3; |
||||
date_low = get_be32(commit_data + g->hash_len + 12); |
||||
item->date = (timestamp_t)((date_high << 32) | date_low); |
||||
|
||||
pptr = &item->parents; |
||||
|
||||
edge_value = get_be32(commit_data + g->hash_len); |
||||
if (edge_value == GRAPH_PARENT_NONE) |
||||
return 1; |
||||
pptr = insert_parent_or_die(g, edge_value, pptr); |
||||
|
||||
edge_value = get_be32(commit_data + g->hash_len + 4); |
||||
if (edge_value == GRAPH_PARENT_NONE) |
||||
return 1; |
||||
if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) { |
||||
pptr = insert_parent_or_die(g, edge_value, pptr); |
||||
return 1; |
||||
} |
||||
|
||||
parent_data_ptr = (uint32_t*)(g->chunk_large_edges + |
||||
4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK)); |
||||
do { |
||||
edge_value = get_be32(parent_data_ptr); |
||||
pptr = insert_parent_or_die(g, |
||||
edge_value & GRAPH_EDGE_LAST_MASK, |
||||
pptr); |
||||
parent_data_ptr++; |
||||
} while (!(edge_value & GRAPH_LAST_EDGE)); |
||||
|
||||
return 1; |
||||
} |
||||
|
||||
int parse_commit_in_graph(struct commit *item) |
||||
{ |
||||
if (!core_commit_graph) |
||||
return 0; |
||||
if (item->object.parsed) |
||||
return 1; |
||||
|
||||
prepare_commit_graph(); |
||||
if (commit_graph) { |
||||
uint32_t pos; |
||||
int found; |
||||
if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) { |
||||
pos = item->graph_pos; |
||||
found = 1; |
||||
} else { |
||||
found = bsearch_graph(commit_graph, &(item->object.oid), &pos); |
||||
} |
||||
|
||||
if (found) |
||||
return fill_commit_in_graph(item, commit_graph, pos); |
||||
} |
||||
|
||||
return 0; |
||||
} |
||||
|
||||
static void write_graph_chunk_fanout(struct hashfile *f, |
||||
struct commit **commits, |
||||
int nr_commits) |
||||
{ |
||||
int i, count = 0; |
||||
struct commit **list = commits; |
||||
|
||||
/* |
||||
* Write the first-level table (the list is sorted, |
||||
* but we use a 256-entry lookup to be able to avoid |
||||
* having to do eight extra binary search iterations). |
||||
*/ |
||||
for (i = 0; i < 256; i++) { |
||||
while (count < nr_commits) { |
||||
if ((*list)->object.oid.hash[0] != i) |
||||
break; |
||||
count++; |
||||
list++; |
||||
} |
||||
|
||||
hashwrite_be32(f, count); |
||||
} |
||||
} |
||||
|
||||
static void write_graph_chunk_oids(struct hashfile *f, int hash_len, |
||||
struct commit **commits, int nr_commits) |
||||
{ |
||||
struct commit **list = commits; |
||||
int count; |
||||
for (count = 0; count < nr_commits; count++, list++) |
||||
hashwrite(f, (*list)->object.oid.hash, (int)hash_len); |
||||
} |
||||
|
||||
static const unsigned char *commit_to_sha1(size_t index, void *table) |
||||
{ |
||||
struct commit **commits = table; |
||||
return commits[index]->object.oid.hash; |
||||
} |
||||
|
||||
static void write_graph_chunk_data(struct hashfile *f, int hash_len, |
||||
struct commit **commits, int nr_commits) |
||||
{ |
||||
struct commit **list = commits; |
||||
struct commit **last = commits + nr_commits; |
||||
uint32_t num_extra_edges = 0; |
||||
|
||||
while (list < last) { |
||||
struct commit_list *parent; |
||||
int edge_value; |
||||
uint32_t packedDate[2]; |
||||
|
||||
parse_commit(*list); |
||||
hashwrite(f, (*list)->tree->object.oid.hash, hash_len); |
||||
|
||||
parent = (*list)->parents; |
||||
|
||||
if (!parent) |
||||
edge_value = GRAPH_PARENT_NONE; |
||||
else { |
||||
edge_value = sha1_pos(parent->item->object.oid.hash, |
||||
commits, |
||||
nr_commits, |
||||
commit_to_sha1); |
||||
|
||||
if (edge_value < 0) |
||||
edge_value = GRAPH_PARENT_MISSING; |
||||
} |
||||
|
||||
hashwrite_be32(f, edge_value); |
||||
|
||||
if (parent) |
||||
parent = parent->next; |
||||
|
||||
if (!parent) |
||||
edge_value = GRAPH_PARENT_NONE; |
||||
else if (parent->next) |
||||
edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_extra_edges; |
||||
else { |
||||
edge_value = sha1_pos(parent->item->object.oid.hash, |
||||
commits, |
||||
nr_commits, |
||||
commit_to_sha1); |
||||
if (edge_value < 0) |
||||
edge_value = GRAPH_PARENT_MISSING; |
||||
} |
||||
|
||||
hashwrite_be32(f, edge_value); |
||||
|
||||
if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) { |
||||
do { |
||||
num_extra_edges++; |
||||
parent = parent->next; |
||||
} while (parent); |
||||
} |
||||
|
||||
if (sizeof((*list)->date) > 4) |
||||
packedDate[0] = htonl(((*list)->date >> 32) & 0x3); |
||||
else |
||||
packedDate[0] = 0; |
||||
|
||||
packedDate[1] = htonl((*list)->date); |
||||
hashwrite(f, packedDate, 8); |
||||
|
||||
list++; |
||||
} |
||||
} |
||||
|
||||
static void write_graph_chunk_large_edges(struct hashfile *f, |
||||
struct commit **commits, |
||||
int nr_commits) |
||||
{ |
||||
struct commit **list = commits; |
||||
struct commit **last = commits + nr_commits; |
||||
struct commit_list *parent; |
||||
|
||||
while (list < last) { |
||||
int num_parents = 0; |
||||
for (parent = (*list)->parents; num_parents < 3 && parent; |
||||
parent = parent->next) |
||||
num_parents++; |
||||
|
||||
if (num_parents <= 2) { |
||||
list++; |
||||
continue; |
||||
} |
||||
|
||||
/* Since num_parents > 2, this initializer is safe. */ |
||||
for (parent = (*list)->parents->next; parent; parent = parent->next) { |
||||
int edge_value = sha1_pos(parent->item->object.oid.hash, |
||||
commits, |
||||
nr_commits, |
||||
commit_to_sha1); |
||||
|
||||
if (edge_value < 0) |
||||
edge_value = GRAPH_PARENT_MISSING; |
||||
else if (!parent->next) |
||||
edge_value |= GRAPH_LAST_EDGE; |
||||
|
||||
hashwrite_be32(f, edge_value); |
||||
} |
||||
|
||||
list++; |
||||
} |
||||
} |
||||
|
||||
static int commit_compare(const void *_a, const void *_b) |
||||
{ |
||||
const struct object_id *a = (const struct object_id *)_a; |
||||
const struct object_id *b = (const struct object_id *)_b; |
||||
return oidcmp(a, b); |
||||
} |
||||
|
||||
struct packed_commit_list { |
||||
struct commit **list; |
||||
int nr; |
||||
int alloc; |
||||
}; |
||||
|
||||
struct packed_oid_list { |
||||
struct object_id *list; |
||||
int nr; |
||||
int alloc; |
||||
}; |
||||
|
||||
static int add_packed_commits(const struct object_id *oid, |
||||
struct packed_git *pack, |
||||
uint32_t pos, |
||||
void *data) |
||||
{ |
||||
struct packed_oid_list *list = (struct packed_oid_list*)data; |
||||
enum object_type type; |
||||
off_t offset = nth_packed_object_offset(pack, pos); |
||||
struct object_info oi = OBJECT_INFO_INIT; |
||||
|
||||
oi.typep = &type; |
||||
if (packed_object_info(pack, offset, &oi) < 0) |
||||
die("unable to get type of object %s", oid_to_hex(oid)); |
||||
|
||||
if (type != OBJ_COMMIT) |
||||
return 0; |
||||
|
||||
ALLOC_GROW(list->list, list->nr + 1, list->alloc); |
||||
oidcpy(&(list->list[list->nr]), oid); |
||||
list->nr++; |
||||
|
||||
return 0; |
||||
} |
||||
|
||||
static void add_missing_parents(struct packed_oid_list *oids, struct commit *commit) |
||||
{ |
||||
struct commit_list *parent; |
||||
for (parent = commit->parents; parent; parent = parent->next) { |
||||
if (!(parent->item->object.flags & UNINTERESTING)) { |
||||
ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc); |
||||
oidcpy(&oids->list[oids->nr], &(parent->item->object.oid)); |
||||
oids->nr++; |
||||
parent->item->object.flags |= UNINTERESTING; |
||||
} |
||||
} |
||||
} |
||||
|
||||
static void close_reachable(struct packed_oid_list *oids) |
||||
{ |
||||
int i; |
||||
struct commit *commit; |
||||
|
||||
for (i = 0; i < oids->nr; i++) { |
||||
commit = lookup_commit(&oids->list[i]); |
||||
if (commit) |
||||
commit->object.flags |= UNINTERESTING; |
||||
} |
||||
|
||||
/* |
||||
* As this loop runs, oids->nr may grow, but not more |
||||
* than the number of missing commits in the reachable |
||||
* closure. |
||||
*/ |
||||
for (i = 0; i < oids->nr; i++) { |
||||
commit = lookup_commit(&oids->list[i]); |
||||
|
||||
if (commit && !parse_commit(commit)) |
||||
add_missing_parents(oids, commit); |
||||
} |
||||
|
||||
for (i = 0; i < oids->nr; i++) { |
||||
commit = lookup_commit(&oids->list[i]); |
||||
|
||||
if (commit) |
||||
commit->object.flags &= ~UNINTERESTING; |
||||
} |
||||
} |
||||
|
||||
void write_commit_graph(const char *obj_dir, |
||||
const char **pack_indexes, |
||||
int nr_packs, |
||||
const char **commit_hex, |
||||
int nr_commits, |
||||
int append) |
||||
{ |
||||
struct packed_oid_list oids; |
||||
struct packed_commit_list commits; |
||||
struct hashfile *f; |
||||
uint32_t i, count_distinct = 0; |
||||
char *graph_name; |
||||
int fd; |
||||
struct lock_file lk = LOCK_INIT; |
||||
uint32_t chunk_ids[5]; |
||||
uint64_t chunk_offsets[5]; |
||||
int num_chunks; |
||||
int num_extra_edges; |
||||
struct commit_list *parent; |
||||
|
||||
oids.nr = 0; |
||||
oids.alloc = approximate_object_count() / 4; |
||||
|
||||
if (append) { |
||||
prepare_commit_graph_one(obj_dir); |
||||
if (commit_graph) |
||||
oids.alloc += commit_graph->num_commits; |
||||
} |
||||
|
||||
if (oids.alloc < 1024) |
||||
oids.alloc = 1024; |
||||
ALLOC_ARRAY(oids.list, oids.alloc); |
||||
|
||||
if (append && commit_graph) { |
||||
for (i = 0; i < commit_graph->num_commits; i++) { |
||||
const unsigned char *hash = commit_graph->chunk_oid_lookup + |
||||
commit_graph->hash_len * i; |
||||
hashcpy(oids.list[oids.nr++].hash, hash); |
||||
} |
||||
} |
||||
|
||||
if (pack_indexes) { |
||||
struct strbuf packname = STRBUF_INIT; |
||||
int dirlen; |
||||
strbuf_addf(&packname, "%s/pack/", obj_dir); |
||||
dirlen = packname.len; |
||||
for (i = 0; i < nr_packs; i++) { |
||||
struct packed_git *p; |
||||
strbuf_setlen(&packname, dirlen); |
||||
strbuf_addstr(&packname, pack_indexes[i]); |
||||
p = add_packed_git(packname.buf, packname.len, 1); |
||||
if (!p) |
||||
die("error adding pack %s", packname.buf); |
||||
if (open_pack_index(p)) |
||||
die("error opening index for %s", packname.buf); |
||||
for_each_object_in_pack(p, add_packed_commits, &oids); |
||||
close_pack(p); |
||||
} |
||||
strbuf_release(&packname); |
||||
} |
||||
|
||||
if (commit_hex) { |
||||
for (i = 0; i < nr_commits; i++) { |
||||
const char *end; |
||||
struct object_id oid; |
||||
struct commit *result; |
||||
|
||||
if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, &end)) |
||||
continue; |
||||
|
||||
result = lookup_commit_reference_gently(&oid, 1); |
||||
|
||||
if (result) { |
||||
ALLOC_GROW(oids.list, oids.nr + 1, oids.alloc); |
||||
oidcpy(&oids.list[oids.nr], &(result->object.oid)); |
||||
oids.nr++; |
||||
} |
||||
} |
||||
} |
||||
|
||||
if (!pack_indexes && !commit_hex) |
||||
for_each_packed_object(add_packed_commits, &oids, 0); |
||||
|
||||
close_reachable(&oids); |
||||
|
||||
QSORT(oids.list, oids.nr, commit_compare); |
||||
|
||||
count_distinct = 1; |
||||
for (i = 1; i < oids.nr; i++) { |
||||
if (oidcmp(&oids.list[i-1], &oids.list[i])) |
||||
count_distinct++; |
||||
} |
||||
|
||||
if (count_distinct >= GRAPH_PARENT_MISSING) |
||||
die(_("the commit graph format cannot write %d commits"), count_distinct); |
||||
|
||||
commits.nr = 0; |
||||
commits.alloc = count_distinct; |
||||
ALLOC_ARRAY(commits.list, commits.alloc); |
||||
|
||||
num_extra_edges = 0; |
||||
for (i = 0; i < oids.nr; i++) { |
||||
int num_parents = 0; |
||||
if (i > 0 && !oidcmp(&oids.list[i-1], &oids.list[i])) |
||||
continue; |
||||
|
||||
commits.list[commits.nr] = lookup_commit(&oids.list[i]); |
||||
parse_commit(commits.list[commits.nr]); |
||||
|
||||
for (parent = commits.list[commits.nr]->parents; |
||||
parent; parent = parent->next) |
||||
num_parents++; |
||||
|
||||
if (num_parents > 2) |
||||
num_extra_edges += num_parents - 1; |
||||
|
||||
commits.nr++; |
||||
} |
||||
num_chunks = num_extra_edges ? 4 : 3; |
||||
|
||||
if (commits.nr >= GRAPH_PARENT_MISSING) |
||||
die(_("too many commits to write graph")); |
||||
|
||||
graph_name = get_commit_graph_filename(obj_dir); |
||||
fd = hold_lock_file_for_update(&lk, graph_name, 0); |
||||
|
||||
if (fd < 0) { |
||||
struct strbuf folder = STRBUF_INIT; |
||||
strbuf_addstr(&folder, graph_name); |
||||
strbuf_setlen(&folder, strrchr(folder.buf, '/') - folder.buf); |
||||
|
||||
if (mkdir(folder.buf, 0777) < 0) |
||||
die_errno(_("cannot mkdir %s"), folder.buf); |
||||
strbuf_release(&folder); |
||||
|
||||
fd = hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR); |
||||
|
||||
if (fd < 0) |
||||
die_errno("unable to create '%s'", graph_name); |
||||
} |
||||
|
||||
f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); |
||||
|
||||
hashwrite_be32(f, GRAPH_SIGNATURE); |
||||
|
||||
hashwrite_u8(f, GRAPH_VERSION); |
||||
hashwrite_u8(f, GRAPH_OID_VERSION); |
||||
hashwrite_u8(f, num_chunks); |
||||
hashwrite_u8(f, 0); /* unused padding byte */ |
||||
|
||||
chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT; |
||||
chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP; |
||||
chunk_ids[2] = GRAPH_CHUNKID_DATA; |
||||
if (num_extra_edges) |
||||
chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES; |
||||
else |
||||
chunk_ids[3] = 0; |
||||
chunk_ids[4] = 0; |
||||
|
||||
chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; |
||||
chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE; |
||||
chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr; |
||||
chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr; |
||||
chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges; |
||||
|
||||
for (i = 0; i <= num_chunks; i++) { |
||||
uint32_t chunk_write[3]; |
||||
|
||||
chunk_write[0] = htonl(chunk_ids[i]); |
||||
chunk_write[1] = htonl(chunk_offsets[i] >> 32); |
||||
chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff); |
||||
hashwrite(f, chunk_write, 12); |
||||
} |
||||
|
||||
write_graph_chunk_fanout(f, commits.list, commits.nr); |
||||
write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr); |
||||
write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr); |
||||
write_graph_chunk_large_edges(f, commits.list, commits.nr); |
||||
|
||||
close_commit_graph(); |
||||
finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC); |
||||
commit_lock_file(&lk); |
||||
|
||||
free(oids.list); |
||||
oids.alloc = 0; |
||||
oids.nr = 0; |
||||
} |
@ -0,0 +1,46 @@
@@ -0,0 +1,46 @@
|
||||
#ifndef COMMIT_GRAPH_H |
||||
#define COMMIT_GRAPH_H |
||||
|
||||
#include "git-compat-util.h" |
||||
|
||||
char *get_commit_graph_filename(const char *obj_dir); |
||||
|
||||
/* |
||||
* Given a commit struct, try to fill the commit struct info, including: |
||||
* 1. tree object |
||||
* 2. date |
||||
* 3. parents. |
||||
* |
||||
* Returns 1 if and only if the commit was found in the packed graph. |
||||
* |
||||
* See parse_commit_buffer() for the fallback after this call. |
||||
*/ |
||||
int parse_commit_in_graph(struct commit *item); |
||||
|
||||
struct commit_graph { |
||||
int graph_fd; |
||||
|
||||
const unsigned char *data; |
||||
size_t data_len; |
||||
|
||||
unsigned char hash_len; |
||||
unsigned char num_chunks; |
||||
uint32_t num_commits; |
||||
struct object_id oid; |
||||
|
||||
const uint32_t *chunk_oid_fanout; |
||||
const unsigned char *chunk_oid_lookup; |
||||
const unsigned char *chunk_commit_data; |
||||
const unsigned char *chunk_large_edges; |
||||
}; |
||||
|
||||
struct commit_graph *load_commit_graph_one(const char *graph_file); |
||||
|
||||
void write_commit_graph(const char *obj_dir, |
||||
const char **pack_indexes, |
||||
int nr_packs, |
||||
const char **commit_hex, |
||||
int nr_commits, |
||||
int append); |
||||
|
||||
#endif |
@ -0,0 +1,224 @@
@@ -0,0 +1,224 @@
|
||||
#!/bin/sh |
||||
|
||||
test_description='commit graph' |
||||
. ./test-lib.sh |
||||
|
||||
test_expect_success 'setup full repo' ' |
||||
mkdir full && |
||||
cd "$TRASH_DIRECTORY/full" && |
||||
git init && |
||||
git config core.commitGraph true && |
||||
objdir=".git/objects" |
||||
' |
||||
|
||||
test_expect_success 'write graph with no packs' ' |
||||
cd "$TRASH_DIRECTORY/full" && |
||||
git commit-graph write --object-dir . && |
||||
test_path_is_file info/commit-graph |
||||
' |
||||
|
||||
test_expect_success 'create commits and repack' ' |
||||
cd "$TRASH_DIRECTORY/full" && |
||||
for i in $(test_seq 3) |
||||
do |
||||
test_commit $i && |
||||
git branch commits/$i |
||||
done && |
||||
git repack |
||||
' |
||||
|
||||
graph_git_two_modes() { |
||||
git -c core.graph=true $1 >output |
||||
git -c core.graph=false $1 >expect |
||||
test_cmp output expect |
||||
} |
||||
|
||||
graph_git_behavior() { |
||||
MSG=$1 |
||||
DIR=$2 |
||||
BRANCH=$3 |
||||
COMPARE=$4 |
||||
test_expect_success "check normal git operations: $MSG" ' |
||||
cd "$TRASH_DIRECTORY/$DIR" && |
||||
graph_git_two_modes "log --oneline $BRANCH" && |
||||
graph_git_two_modes "log --topo-order $BRANCH" && |
||||
graph_git_two_modes "log --graph $COMPARE..$BRANCH" && |
||||
graph_git_two_modes "branch -vv" && |
||||
graph_git_two_modes "merge-base -a $BRANCH $COMPARE" |
||||
' |
||||
} |
||||
|
||||
graph_git_behavior 'no graph' full commits/3 commits/1 |
||||
|
||||
graph_read_expect() { |
||||
OPTIONAL="" |
||||
NUM_CHUNKS=3 |
||||
if test ! -z $2 |
||||
then |
||||
OPTIONAL=" $2" |
||||
NUM_CHUNKS=$((3 + $(echo "$2" | wc -w))) |
||||
fi |
||||
cat >expect <<- EOF |
||||
header: 43475048 1 1 $NUM_CHUNKS 0 |
||||
num_commits: $1 |
||||
chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL |
||||
EOF |
||||
git commit-graph read >output && |
||||
test_cmp expect output |
||||
} |
||||
|
||||
test_expect_success 'write graph' ' |
||||
cd "$TRASH_DIRECTORY/full" && |
||||
graph1=$(git commit-graph write) && |
||||
test_path_is_file $objdir/info/commit-graph && |
||||
graph_read_expect "3" |
||||
' |
||||
|
||||
graph_git_behavior 'graph exists' full commits/3 commits/1 |
||||
|
||||
test_expect_success 'Add more commits' ' |
||||
cd "$TRASH_DIRECTORY/full" && |
||||
git reset --hard commits/1 && |
||||
for i in $(test_seq 4 5) |
||||
do |
||||
test_commit $i && |
||||
git branch commits/$i |
||||
done && |
||||
git reset --hard commits/2 && |
||||
for i in $(test_seq 6 7) |
||||
do |
||||
test_commit $i && |
||||
git branch commits/$i |
||||
done && |
||||
git reset --hard commits/2 && |
||||
git merge commits/4 && |
||||
git branch merge/1 && |
||||
git reset --hard commits/4 && |
||||
git merge commits/6 && |
||||
git branch merge/2 && |
||||
git reset --hard commits/3 && |
||||
git merge commits/5 commits/7 && |
||||
git branch merge/3 && |
||||
git repack |
||||
' |
||||
|
||||
# Current graph structure: |
||||
# |
||||
# __M3___ |
||||
# / | \ |
||||
# 3 M1 5 M2 7 |
||||
# |/ \|/ \| |
||||
# 2 4 6 |
||||
# |___/____/ |
||||
# 1 |
||||
|
||||
test_expect_success 'write graph with merges' ' |
||||
cd "$TRASH_DIRECTORY/full" && |
||||
git commit-graph write && |
||||
test_path_is_file $objdir/info/commit-graph && |
||||
graph_read_expect "10" "large_edges" |
||||
' |
||||
|
||||
graph_git_behavior 'merge 1 vs 2' full merge/1 merge/2 |
||||
graph_git_behavior 'merge 1 vs 3' full merge/1 merge/3 |
||||
graph_git_behavior 'merge 2 vs 3' full merge/2 merge/3 |
||||
|
||||
test_expect_success 'Add one more commit' ' |
||||
cd "$TRASH_DIRECTORY/full" && |
||||
test_commit 8 && |
||||
git branch commits/8 && |
||||
ls $objdir/pack | grep idx >existing-idx && |
||||
git repack && |
||||
ls $objdir/pack| grep idx | grep -v --file=existing-idx >new-idx |
||||
' |
||||
|
||||
# Current graph structure: |
||||
# |
||||
# 8 |
||||
# | |
||||
# __M3___ |
||||
# / | \ |
||||
# 3 M1 5 M2 7 |
||||
# |/ \|/ \| |
||||
# 2 4 6 |
||||
# |___/____/ |
||||
# 1 |
||||
|
||||
graph_git_behavior 'mixed mode, commit 8 vs merge 1' full commits/8 merge/1 |
||||
graph_git_behavior 'mixed mode, commit 8 vs merge 2' full commits/8 merge/2 |
||||
|
||||
test_expect_success 'write graph with new commit' ' |
||||
cd "$TRASH_DIRECTORY/full" && |
||||
git commit-graph write && |
||||
test_path_is_file $objdir/info/commit-graph && |
||||
graph_read_expect "11" "large_edges" |
||||
' |
||||
|
||||
graph_git_behavior 'full graph, commit 8 vs merge 1' full commits/8 merge/1 |
||||
graph_git_behavior 'full graph, commit 8 vs merge 2' full commits/8 merge/2 |
||||
|
||||
test_expect_success 'write graph with nothing new' ' |
||||
cd "$TRASH_DIRECTORY/full" && |
||||
git commit-graph write && |
||||
test_path_is_file $objdir/info/commit-graph && |
||||
graph_read_expect "11" "large_edges" |
||||
' |
||||
|
||||
graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1 |
||||
graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2 |
||||
|
||||
test_expect_success 'build graph from latest pack with closure' ' |
||||
cd "$TRASH_DIRECTORY/full" && |
||||
cat new-idx | git commit-graph write --stdin-packs && |
||||
test_path_is_file $objdir/info/commit-graph && |
||||
graph_read_expect "9" "large_edges" |
||||
' |
||||
|
||||
graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1 |
||||
graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2 |
||||
|
||||
test_expect_success 'build graph from commits with closure' ' |
||||
cd "$TRASH_DIRECTORY/full" && |
||||
git tag -a -m "merge" tag/merge merge/2 && |
||||
git rev-parse tag/merge >commits-in && |
||||
git rev-parse merge/1 >>commits-in && |
||||
cat commits-in | git commit-graph write --stdin-commits && |
||||
test_path_is_file $objdir/info/commit-graph && |
||||
graph_read_expect "6" |
||||
' |
||||
|
||||
graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1 |
||||
graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2 |
||||
|
||||
test_expect_success 'build graph from commits with append' ' |
||||
cd "$TRASH_DIRECTORY/full" && |
||||
git rev-parse merge/3 | git commit-graph write --stdin-commits --append && |
||||
test_path_is_file $objdir/info/commit-graph && |
||||
graph_read_expect "10" "large_edges" |
||||
' |
||||
|
||||
graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1 |
||||
graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2 |
||||
|
||||
test_expect_success 'setup bare repo' ' |
||||
cd "$TRASH_DIRECTORY" && |
||||
git clone --bare --no-local full bare && |
||||
cd bare && |
||||
git config core.commitGraph true && |
||||
baredir="./objects" |
||||
' |
||||
|
||||
graph_git_behavior 'bare repo, commit 8 vs merge 1' bare commits/8 merge/1 |
||||
graph_git_behavior 'bare repo, commit 8 vs merge 2' bare commits/8 merge/2 |
||||
|
||||
test_expect_success 'write graph in bare repo' ' |
||||
cd "$TRASH_DIRECTORY/bare" && |
||||
git commit-graph write && |
||||
test_path_is_file $baredir/info/commit-graph && |
||||
graph_read_expect "11" "large_edges" |
||||
' |
||||
|
||||
graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1 |
||||
graph_git_behavior 'bare repo with graph, commit 8 vs merge 2' bare commits/8 merge/2 |
||||
|
||||
test_done |
Loading…
Reference in new issue