Add Subproject Design Notes.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-22 23:53:07 -08:00 · 2006-01-22 23:53:07 -08:00 · 725ca8a8e7
parent 18ea0bf72c
commit 725ca8a8e7
2 changed files with 483 additions and 0 deletions
--- a/11
+++ b/11
@ -0,0 +1,11 @@
+all:
+
+clean:
+	rm -f Subpro.html
+
+
+all: Subpro.html
+
+%.html: %.txt
+	asciidoc -bxhtml11 $*.txt
+
--- a/Subpro.txt
+++ b/Subpro.txt
@ -0,0 +1,472 @@
+Notes on Subproject Support
+===========================
+Junio C Hamano
+
+Scenario
+--------
+
+The examples in the following discussion show how this proposal
+plans to help this:
+
+. A project to build an embedded Linux appliance "gadget" is
+  maintained with git.
+
+. The project uses linux-2.6 kernel as its subcomponent.  It
+  starts from a particular version of the mainline kernel, but
+  adds its own code and build infrastructure to fit the
+  appliance's needs.
+
+. The working tree of the project is laid out this way:
+
+------------
+ Makefile       - Builds the whole thing.
+ linux-2.6/     - The kernel, perhaps modified for the project.
+ appliance/     - Applications that run on the appliance, and
+                  other bits.
+------------
+
+. The project is willing to maintain its own changes out of tree
+  of the Linux kernel project, but would want to be able to feed
+  the changes upstream, and incorporate upstream changes to its
+  own tree, taking advantage of the fact that both itself and
+  the Linux kernel project are version controlled with git.
+
+. To make the story a bit more interesting, later in the history
+  of development, `linux-2.6/` and `appliance/` directories will
+  be renamed to `kernel/` and `gadget/`.
+
+The idea here is to:
+
+. Keep `linux-2.6/` part as an independent project.  The work by
+  the project on the kernel part can be naturally exchanged with
+  the other kernel developers this way.  Specifically, a tree
+  object contained in commit objects belonging to this project
+  does *not* have `linux-2.6/` directory at the top.
+
+. Keep the `appliance/` part as another independent project.
+  Applications are supposed to be more or less independent from
+  the kernel version, but some other bits might be tied to a
+  specific kernel version.  Again, a tree object contained in
+  commit objects belonging to this project does *not* have
+  `appliance/` directory at the top.
+
+. Have another project that combines the whole thing together,
+  so that the project can keep track of which versions of the
+  parts are built together.
+
+We will call the project that binds things together the
+'toplevel project'.  Other projects that hold `linux-2.6/` part
+and `appliance/` part are called 'subprojects'.
+
+
+Setting up
+----------
+
+Let's say we have been working on the appliance software,
+independently version controlled with git.  Also the kernel part
+has been version controlled separately, like this:
+------------
+$ ls -dF current/*/.git current/*
+current/Makefile    current/appliance/.git/  current/linux-2.6/.git/
+current/appliance/  current/linux-2.6/
+------------
+
+Now we would want to get a combined project.  First we would
+clone from these repositories (which is not strictly needed --
+we could use `$GIT_ALTERNATE_OBJECT_DIRECTORIES` instead):
+
+------------
+$ mkdir combined && cd combined
+$ cp ../current/Makefile .
+$ git init-db
+$ mkdir -p .git/refs/subs/{kernel,gadget}/{heads,tags}
+$ git clone-pack ../current/linux-2.6/ master | read kernel_commit junk
+$ git clone-pack ../current/appliance/ master | read gadget_commit junk
+------------
+
+We will introduce a new command to set up a combined project:
+
+------------
+$ git bind-projects \
+	$kernel_commit linux-2.6/ \
+	$gadget_commit appliance/
+------------
+
+This would probably do an equivalent of:
+
+------------
+$ rm -f "$GIT_DIR/index"
+$ git read-tree --prefix=linux-2.6/ $kernel_commit
+$ git read-tree --prefix=appliance/ $gadget_commit
+$ git update-index --bind linux-2.6/ $kernel_commit
+$ git update-index --bind appliance/ $gadget_commit
+------------
+[NOTE]
+============
+Earlier outlines sent to the git mailing list talked
+about `$GIT_DIR/bind` to record what subproject are bound to
+which subtree in the current working tree and index.  This
+proposal instead records that information in the index file
+with `update-index --bind` command.
+
+Also note that in this round of proposal, there is no separate
+branches that keep track of heads of subprojects.
+============
+
+Let's not forget to add the `Makefile`, and check the whole
+thing out from the index file.
+------------
+$ git add Makefile
+$ git checkout-index -f -u -q -a
+------------
+
+Now our directory should be identical with the `current`
+directory.  After making sure of that, we should be able to
+commit the whole thing:
+
+------------
+$ diff -x .git -r ../current ../combined
+$ git commit -m 'Initial toplevel project commit'
+------------
+
+Which should create a new commit object that records what is in
+the index file as its tree, with `bind` lines to record which
+subproject commit objects are bound at what subdirectory, and
+updates the `$GIT_DIR/refs/heads/master`.  Such a commit object
+might look like this:
+------------
+tree 04803b09c300c8325258ccf2744115acc4c57067
+bind 5b2bcc7b2d546c636f79490655b3347acc91d17f linux-2.6/
+bind 0bdd79af62e8621359af08f0afca0ce977348ac7 appliance/
+author Junio C Hamano <junio@kernel.org> 1137965565 -0800
+committer Junio C Hamano <junio@kernel.org> 1137965565 -0800
+
+Initial toplevel project commit
+------------
+
+Notice that `Makefile` at the top is part of the toplevel
+project in this example, but it is not necessary.  We could
+instead have the appliance subproject include this file.  In
+such a setup, the appliance subproject would have had `Makefile`
+and `appliance/` directory at the toplevel.  The `bind` line for
+that project would have said "the rest is bound at `/`" and
+`write-tree \--exclude=linux-2.6/` would have been used to write
+the tree for that subproject out of the combined index.
+
+
+Making further commits
+----------------------
+
+The easiest case is when you updated the Makefile without
+changing anything in the subprojects.  In such a case, we just
+need to create a new commmit object that records the new tree
+with the current `HEAD` as its parent, and with the same set of
+`bind` lines.
+
+When we have changes to the subproject part, we would make a
+separate commit to the subproject part and then record the whole
+thing by making a commit to the toplevel project.  The user
+interaction might go this way:
+------------
+$ git commit
+error: you have changes to the subproject bound at linux-2.6/.
+$ git commit --subproject linux-2.6/
+$ git commit
+------------
+
+With the new `\--subproject` option, the directory structure
+rooted at `linux-2.6/` part is written out as a tree, and a new
+commit object that records that tree object with the commit
+bound to that portion of the tree (`5b2bcc7b` in the above
+example) as its parent is created.  Then the final `git commit`
+would record the whole tree with updated `bind` line for the
+`linux-2.6/` part.
+
+
+Checking out
+------------
+
+After cloning such a toplevel project, `git clone` without `-n`
+option would check out the working tree.  This is done by
+reading the tree object recorded in the commit object (which
+records the whole thing), and adding the information from the
+"bind" line to the index file.
+
+------------
+$ cd ..
+$ git clone -n combined cloned ;# clone the one we created earlier
+$ cd cloned
+$ git checkout
+------------
+
+This round of proposal does not maintain separate branch heads
+for subprojects.  The bound commits and their subdirectories
+are recorded in the index file from the commit object, so there
+is no need to do anything other than updating the index and the
+working tree.
+
+
+Switching branches
+------------------
+
+Along with the traditional two-way merge by `read-tree -m -u`,
+we would need to look at:
+
+. `bind` lines in the current `HEAD` commit.
+
+. `bind` lines in the commit we are switching to.
+
+. subproject binding information in the index file.
+
+to make sure we do sensible things.
+
+Just like until very recently we did not allow switching
+branches when two-way merge would lose local changes, we can
+start by refusing to switch branches when the subprojects bound
+in the index do not match what is recorded in the `HEAD` commit.
+
+Because in this round of the proposal we do not use the
+`$GIT_DIR/bind` file nor separate branches to keep track of
+heads of the subprojects, there is nothing else other than the
+working tree and the index file that needs to be updated when
+switching branches.
+
+
+Merging
+-------
+
+Merging two branches of the toplevel projects can use the
+traditional merging mechanism mostly unchanged.  The merge base
+computation can be done using the `parent` ancestry information
+taken from the two toplevel project branch heads being merged,
+and merging of the whole tree can be done with a three-way merge
+of the whole tree using the merge base and two head commits.
+For reasons described later, we would not merge the subproject
+parts of the trees during this step, though.
+
+When the two branch heads use different versions of subproject,
+things get a bit tricky.  First, let's forget for a moment about
+the case where they bind the same project at different location.
+We would refuse if they do not have the same number of `bind`
+lines that bind something at the same subdirectories.
+
+------------
+$ git merge 'Merge in a side branch' HEAD side
+error: the merged heads have subprojects bound at different places.
+ ours:
+	linux-2.6/
+	appliance/
+ theirs:
+	kernel/
+	gadget/
+	manual/
+------------
+
+Such renaming can be handled by first moving the bind points in
+our branch, and redoing the merge (this is a rare operation
+anyway).  It might go like this:
+
+------------
+$ git reset
+$ git update-index --unbind linux-2.6/
+$ git update-index --unbind appliance/
+$ git update-index --bind $kernel_commit kernel/
+$ git update-index --bind $gadget_commit gadget/
+$ git commit -m 'Prepare for merge with side branch'
+$ git merge 'Merge in a side branch' HEAD side
+error: the merged heads have subprojects bound at different places.
+ ours:
+	kernel/
+	gadget/
+ theirs:
+	kernel/
+	gadget/
+	manual/
+------------
+
+Their branch added another subproject, so this did not work (or
+it could be the other way around -- we might have been the one
+with `manual/` subproject while they didn't).  This suggests
+that we may want an option to `git merge` to allow taking a
+union of subprojects.  Again, this is a rare operation, and
+always taking a union would have created a toplevel project that
+had both `kernel/` and `linux-2.6/` bound to the same Linux
+kernel project from possibly different vintage, so it would be
+prudent to require the set of bound subprojects to exactly match
+and give the user an option to take a union.
+
+------------
+$ git merge --union-subprojects 'Merge in a side branch HEAD side
+error: the subproject at 'kernel/' needs to be merged first.
+------------
+
+Here, the version of the Linux kernel project in the `side`
+branch was different from what our branch had on our `bind`
+line.  On what kind of difference should we give this error?
+Initially, I think we could require one is the fast forward of
+the other (ours might be ahead of theirs, or the other way
+around), and take the descendant.
+
+Or we could do an independent merge of subprojects heads, using
+the `parent` ancestry of the bound subproject heads to find
+their merge-base and doing a three-way merge.  This would leave
+the merge result in the subproject part of the working tree and
+the index.
+
+[NOTE]
+This is the reason we did not do the whole-tree three way merge
+earlier.  The subproject commit bound to the merge base commit
+used for the toplevel project may not be the merge base between
+the subproject commits bound to the two toplevel project
+commits.
+
+So let's deal with the case to merge only a subproject part into
+our tree first.
+
+
+Merging subprojects
+-------------------
+
+An operation of more practical importance is to be able to merge
+in changes done outside to the projects bound to our toplevel
+project.
+
+------------
+$ git pull --subproject=kernel/ git://git.kernel.org/.../linux-2.6/
+------------
+
+might do:
+
+. fetch the current `HEAD` commit from Linus.
+. find the subproject commit bound at kernel/ subtree.
+. perform the usual three-way merge of these two commits, in
+  `kernel/` part of the working tree.
+
+After that, `git commit \--subproject` option would be needed to
+make a commit.
+
+[NOTE]
+This suggests that we would need to have something similar to
+`MERGE_HEAD` for merging the subproject part.  In the case of
+merging two toplevel project commits, we probably can read the
+`bind` lines from the `MERGE_HEAD` commit and either our `HEAD`
+commit or our index file.  Further, we probably would require
+that the latter two must match, just as we currently require the
+index file matches our `HEAD` commit before `git merge`.
+
+Just like the current `pull = fetch + merge` semantics, the
+subproject aware version `git pull \--subproject=frotz/` would be
+a `git fetch \--subproject=frotz/` followed by a `git merge
+\--subproject=frotz/`.  So the above would be:
+
+. Fetch the head.
+
+------------
+$ git fetch --subproject=kernel/ git://git.kernel.org/.../linux-2.6/
+------------
+
+which would fetch the commit chain from the remote repository, and
+write something like this to `FETCH_HEAD`:
+
+------------
+3ee68c4...\tfor-merge-into kernel/\tbranch 'master' of git://.../linux-2.6
+------------
+
+. Run `git merge`.
+
+------------
+$ git merge --subproject=kernel/ \
+    'Merge git://.../linux-2.6 into kernel/' HEAD 3ee68c4...
+------------
+
+. In case it does not cleanly automerge, `git merge` would write
+the necessary information for a later `git commit` to use in
+`MERGE_HEAD`.  It may look like this:
+
+------------
+3ee68c4af3fd7228c1be63254b9f884614f9ebb2	kernel/
+------------
+
+Similarly, `MERGE_MSG` file will hold the merge message.
+
+With this, a later invocation of `git commit` to record the
+result of hand resolving would be able to notice that:
+
+. We should be first resolving `kernel/` subproject, not the
+  whole thing.
+. The remote `HEAD` is `3ee68c4\...` commit.
+. The merge message is `Merge git://\.../linux-2.6 into kernel/`.
+
+and would make a merge commit, and register that resulting
+commit in the index file using `update-index \--bind` instead of
+updating *any* branch head.
+
+
+Management of Subprojects
+-------------------------
+
+While the above as a mechanism would support version controlling
+of subprojects as a part of *one* larger toplevel project, it
+probably is worth pointing out that having a separate repository
+to manage the subproject independently would be a good idea.
+The same subproject can be incorporated into more than one
+toplevel projects, and after all, a subproject should be
+something that can stand on its own.  In our example scenario,
+the `kernel/` project is used as a subproject for the "gadget"
+product, but at the same time, the organizaton that runs the
+"gadget" project may use Linux on their development machines,
+and have their own kernel hackers, not necessarily related to
+the use of the kernel in the "gadget" product.
+
+What this suggests is that not just we need to be able to pull
+the kernel development history *into* the subproject of the
+"gadget" project, but also we need to be able to push the
+development history of the kernel part alone *out* *of* the
+"gadget" project to another repository that deals only with the
+kernel part.
+
+It might go this way.  First the setup:
+
+------------
+$ git clone git://git.kernel.org/.../linux-2.6 Linux
+$ ls -dF *
+cloned/      combined/    current/     Linux/
+------------
+
+That is, in addition to the `combined/` which we have been using
+to develop the "gadget" product in, we now have a repository for
+the kernel, cloned from Linus.  In the previous section, we have
+outlined how we update the kernel subproject part of `combined/`
+repository from the `kernel.org` repository.  The same procedure
+would work for pulling from `Linux/` repository here.
+
+We are now going the other way; propagate the kernel work done
+in the "gadget" project repository `combined/` back to `Linux/`.
+We might do this at the lowest level:
+
+------------
+$ cd combined
+$ git cat-file commit HEAD |
+  sed -ne 's|^bind \([0-9a-f]*\) kernel/$|\1|p' >.git/refs/heads/linux26
+$ git push ../Linux linux26:master
+------------
+
+Or, more realistically, since the `Linux` project might already
+have their own commits on its `master`:
+
+------------
+$ cd Linux
+$ git pull ../combined linux26
+------------
+
+Either way we would need an easy way to maintain the `linux26`
+branch in the above example, and that will have to be part of
+the wrapper scripts like `git commit` (more likely, that would
+be a job for `git commit \--subproject`) for the usability's
+sake; in other words, the `cat-file commit` piped to `sed` above
+is not something the end user would do, but something that is
+done by the wrapper scripts.
+
+Hopefully the people who work in `Linux/` repository would run
+`format-patch` and feed their changes back to the kernel
+community.