Browse Source
This really is very basic stuff, no branches, no merging, no CVS imports. Let's start small.maint
Linus Torvalds
20 years ago
1 changed files with 413 additions and 0 deletions
@ -0,0 +1,413 @@
@@ -0,0 +1,413 @@
|
||||
A short git tutorial |
||||
==================== |
||||
May 2005 |
||||
|
||||
|
||||
Introduction |
||||
------------ |
||||
|
||||
This is trying to be a short tutorial on setting up and using a git |
||||
archive, mainly because being hands-on and using explicit examples is |
||||
often the best way of explaining what is going on. |
||||
|
||||
In normal life, most people wouldn't use the "core" git programs |
||||
directly, but rather script around them to make them more palatable. |
||||
Understanding the core git stuff may help some people get those scripts |
||||
done, though, and it may also be instructive in helping people |
||||
understand what it is that the higher-level helper scripts are actually |
||||
doing. |
||||
|
||||
The core git is often called "plumbing", with the prettier user |
||||
interfaces on top of it called "porcelain". You may want to know what |
||||
the plumbing does for when the porcelain isn't flushing... |
||||
|
||||
|
||||
Creating a git archive |
||||
---------------------- |
||||
|
||||
Creating a new git archive couldn't be easier: all git archives start |
||||
out empty, and the only thing you need to do is find yourself a |
||||
subdirectory that you want to use as a working tree - either an empty |
||||
one for a totally new project, or an existing working tree that you want |
||||
to import into git. |
||||
|
||||
For our first example, we're going to start a totally new arhive from |
||||
scratch, with no pre-existing files, and we'll call it "git-tutorial". |
||||
To start up, create a subdirectory for it, change into that |
||||
subdirectory, and initialize the git infrastructure with "git-init-db": |
||||
|
||||
mkdir git-tutorial |
||||
cd git-tutorial |
||||
git-init-db |
||||
|
||||
to which git will reply |
||||
|
||||
defaulting to local storage area |
||||
|
||||
which is just gits way of saying that you haven't been doing anything |
||||
strange, and that it will have created a local .git directory setup for |
||||
your new project. You will now have a ".git" directory, and you can |
||||
inspect that with "ls". For your new empty project, ls should show you |
||||
three entries: |
||||
|
||||
- a symlink called HEAD, pointing to "refs/heads/master" |
||||
|
||||
Don't worry about the fact that the file that the HEAD link points to |
||||
dosn't even exist yet - you haven't created the commit that will |
||||
start your HEAD development branch yet. |
||||
|
||||
- a subdirectory called "objects", which will contain all the git SHA1 |
||||
objects of your project. You should never have any real reason to |
||||
look at the objects directly, but you might want to know that these |
||||
objects are what contains all the real _data_ in your repository. |
||||
|
||||
- a subdirectory called "refs", which contains references to objects. |
||||
|
||||
In particular, the "refs" subdirectory will contain two other |
||||
subdirectories, named "heads" and "tags" respectively. They do |
||||
exactly what their names imply: they contain references to any number |
||||
of different "heads" of development (aka "branches"), and to any |
||||
"tags" that you have created to name specific versions of your |
||||
repository. |
||||
|
||||
One note: the special "master" head is the default branch, which is |
||||
why the .git/HEAD file was created as a symlink to it even if it |
||||
doesn't yet exist. Bascially, the HEAD link is supposed to always |
||||
point to the branch you are working on right now, and you always |
||||
start out expecting to work on the "master" branch. |
||||
|
||||
However, this is only a convention, and you can name your branches |
||||
anything you want, and don't have to ever even _have_ a "master" |
||||
branch. A number of the git tools will assume that .git/HEAD is |
||||
valid, though. |
||||
|
||||
[ Implementation note: an "object" is identified by its 160-bit SHA1 |
||||
hash, aka "name", and a reference to an object is always the 40-byte |
||||
hex representation of that SHA1 name. The files in the "refs" |
||||
subdirectory are expected to contain these hex references (usually |
||||
with a final '\n' at the end), and you should thus expect to see a |
||||
number of 41-byte files containing these references in this refs |
||||
subdirectories when you actually start populating your tree ] |
||||
|
||||
You have now created your first git archive. Of course, since it's |
||||
empty, that's not very useful, so let's start populating it with data. |
||||
|
||||
|
||||
Populating a git archive |
||||
------------------------ |
||||
|
||||
We'll keep this simple and stupid, so we'll start off with populating a |
||||
few trivial files just to get a feel for it. |
||||
|
||||
Start off with just creating any random files that you want to maintain |
||||
in your git archive. We'll start off with a few bad examples, just to |
||||
get a feel for how this works: |
||||
|
||||
echo "Hello World" > a |
||||
echo "Silly example" > b |
||||
|
||||
you have now created two files in your working directory, but to |
||||
actually check in your hard work, you will have to go through two steps: |
||||
|
||||
- fill in the "cache" aka "index" file with the information about your |
||||
working directory state |
||||
|
||||
- commit that index file as an object. |
||||
|
||||
The first step is trivial: when you want to tell git about any changes |
||||
to your working directory, you use the "git-update-cache" program. That |
||||
program normally just takes a list of filenames you want to update, but |
||||
to avoid trivial mistakes, it refuses to add new entries to the cache |
||||
(or remove existing ones) unless you explicitly tell it that you're |
||||
adding a new entry with the "--add" flag (or removing an entry with the |
||||
"--remove") flag. |
||||
|
||||
So to populate the index with the two files you just created, you can do |
||||
|
||||
git-update-cache --add a b |
||||
|
||||
and you have now told git to track those two files. |
||||
|
||||
In fact, as you did that, if you now look into your object directory, |
||||
you'll notice that git will have added two ne wobjects to the object |
||||
store. If you did exactly the steps above, you should now be able to do |
||||
|
||||
ls .git/objects/??/* |
||||
|
||||
and see two files: |
||||
|
||||
.git/objects/55/7db03de997c86a4a028e1ebd3a1ceb225be238 |
||||
.git/objects/f2/4c74a2e500f5ee1332c86b94199f52b1d1d962 |
||||
|
||||
which correspond with the object with SHA1 names of 557db... and f24c7.. |
||||
respectively. |
||||
|
||||
If you want to, you can use "git-cat-file" to look at those objects, but |
||||
you'll have to use the object name, not the filename of the object: |
||||
|
||||
git-cat-file -t 557db03de997c86a4a028e1ebd3a1ceb225be238 |
||||
|
||||
where the "-t" tells git-cat-file to tell you what the "type" of the |
||||
object is. Git will tell you that you have a "blob" object (ie just a |
||||
regular file), and you can see the contents with |
||||
|
||||
git-cat-file "blob" 557db03de997c86a4a028e1ebd3a1ceb225be238 |
||||
|
||||
which will print out "Hello World". The object 557db... is nothing |
||||
more than the contents of your file "a". |
||||
|
||||
[ Digression: don't confuse that object with the file "a" itself. The |
||||
object is literally just those specific _contents_ of the file, and |
||||
however much you later change the contents in file "a", the object we |
||||
just looked at will never change. Objects are immutable. ] |
||||
|
||||
Anyway, as we mentioned previously, you normally never actually take a |
||||
look at the objects themselves, and typing long 40-character hex SHA1 |
||||
names is not something you'd normally want to do. The above digression |
||||
was just to show that "git-update-cache" did something magical, and |
||||
actually saved away the contents of your files into the git content |
||||
store. |
||||
|
||||
Updating the cache did something else too: it created a ".git/index" |
||||
file. This is the index that describes your current working tree, and |
||||
something you should be very aware of. Again, you normally never worry |
||||
about the index file itself, but you should be aware of the fact that |
||||
you have not actually really "checked in" your files into git so far, |
||||
you've only _told_ git about them. |
||||
|
||||
However, since git knows about them, you can how start using some of the |
||||
most basic git commands to manipulate the files or look at their status. |
||||
|
||||
In particular, let's not even check in the two files into git yet, we'll |
||||
start off by adding another line to "a" first: |
||||
|
||||
echo "It's a new day for git" >> a |
||||
|
||||
and you can now, since you told git about the previous state of "a", ask |
||||
git what has changed in the tree compared to your old index, using the |
||||
"git-diff-files" command: |
||||
|
||||
git-diff-files |
||||
|
||||
oops. That wasn't very readable. It just spit out its own internal |
||||
version of a "diff", but that internal version really just tells you |
||||
that it has noticed that "a" has been modified, and that the old object |
||||
contents it had have been replaced with something else. |
||||
|
||||
To make it readable, we can tell git-diff-files to output the |
||||
differences as a patch, using the "-p" flag: |
||||
|
||||
git-diff-files -p |
||||
|
||||
which will spit out |
||||
|
||||
diff --git a/a b/a |
||||
--- a/a |
||||
+++ b/a |
||||
@@ -1 +1,2 @@ |
||||
Hello World |
||||
+It's a new day for git |
||||
|
||||
ie the diff of the change we caused by adding another line to "a". |
||||
|
||||
In other words, git-diff-files always shows us the difference between |
||||
what is recorded in the index, and what is currently in the working |
||||
tree. That's very useful. |
||||
|
||||
|
||||
Committing git state |
||||
-------------------- |
||||
|
||||
Now, we want to go to the next stage in git, which is to take the files |
||||
that git knows about in the index, and commit them as a real tree. We do |
||||
that in two phases: creating a "tree" object, and committing that "tree" |
||||
object as a "commit" object together with an explanation of what the |
||||
tree was all about, along with information of how we came to that state. |
||||
|
||||
Creating a tree object is trivial, and is done with "git-write-tree". |
||||
There are no options or other input: git-write-tree will take the |
||||
current index state, and write an object that describes that whole |
||||
index. In other words, we're now tying together all the different |
||||
filenames with their contents (and their permissions), and we're |
||||
creating the equivalent of a git "directory" object: |
||||
|
||||
git-write-tree |
||||
|
||||
and this will just output the name of the resulting tree, in this case |
||||
(if you have does exactly as I've described) it should be |
||||
|
||||
3ede4ed7e895432c0a247f09d71a76db53bd0fa4 |
||||
|
||||
which is another incomprehensible object name. Again, if you want to, |
||||
you can use "git-cat-file -t 3ede4.." to see that this time the object |
||||
is not a "blob" object, but a "tree" object (you can also use |
||||
git-cat-file to actually output the raw object contents, but you'll see |
||||
mainly a binary mess, so that's less interesting). |
||||
|
||||
However - normally you'd never use "git-write-tree" on its own, because |
||||
normally you always commit a tree into a commit object using the |
||||
"git-commit-tree" command. In fact, it's easier to not actually use |
||||
git-write-tree on its own at all, but to just pass its result in as an |
||||
argument to "git-commit-tree". |
||||
|
||||
"git-commit-tree" normally takes several arguments - it wants to know |
||||
what the _parent_ of a commit was, but since this is the first commit |
||||
ever in this new archive, and it has no parents, we only need to pass in |
||||
the tree ID. However, git-commit-tree also wants to get a commit message |
||||
on its standard input, and it will write out the resulting ID for the |
||||
commit to its standard output. |
||||
|
||||
And this is where we start using the .git/HEAD file. The HEAD file is |
||||
supposed to contain the reference to the top-of-tree, and since that's |
||||
exactly what git-commit-tree spits out, we can do this all with a simple |
||||
shell pipeline: |
||||
|
||||
echo "Initial commit" | git-commit-tree $(git-write-tree) > .git/HEAD |
||||
|
||||
which will say: |
||||
|
||||
Committing initial tree 3ede4ed7e895432c0a247f09d71a76db53bd0fa4 |
||||
|
||||
just to warn you about the fact that it created a totally new commit |
||||
that is not related to anything else. Normally you do this only _once_ |
||||
for a project ever, and all later commits will be parented on top of an |
||||
earlier commit, and you'll never see this "Committing initial tree" |
||||
message ever again. |
||||
|
||||
|
||||
Making a change |
||||
--------------- |
||||
|
||||
Remember how we did the "git-update-cache" on file "a" and then we |
||||
changed "a" afterwards, and could compare the new state of "a" with the |
||||
state we saved in the index file? |
||||
|
||||
Further, remember how I said that "git-write-tree" writes the contents |
||||
of the _index_ file to the tree, and thus what we just committed was in |
||||
fact the _original_ contents of the file "a", not the new ones. We did |
||||
that on purpose, to show the difference between the index state, and the |
||||
state in the working directory, and how they don't have to match, even |
||||
when we commit things. |
||||
|
||||
As before, if we do "git-diff-files -p" in our git-tutorial project, |
||||
we'll still see the same difference we saw last time: the index file |
||||
hasn't changed by the act of committing anything. However, now that we |
||||
have committed something, we can also learn to use a new command: |
||||
"git-diff-cache". |
||||
|
||||
Unlike "git-diff-files", which showed the difference between the index |
||||
file and the working directory, "git-diff-cache" shows the differences |
||||
between a committed _tree_ and the index file. In other words, |
||||
git-diff-cache wants a tree to be diffed against, and before we did the |
||||
commit, we couldn't do that, because we didn't have anything to diff |
||||
against. |
||||
|
||||
But now we can do |
||||
|
||||
git-diff-cache -p HEAD |
||||
|
||||
(where "-p" has the same meaning as it did in git-diff-files), and it |
||||
will show us the same difference, but for a totally different reason. |
||||
Now we're not comparing against the index file, we're comparing against |
||||
the tree we just wrote. It just so happens that those two are obviously |
||||
the same. |
||||
|
||||
"git-diff-cache" also has a specific flag "--cached", which is used to |
||||
tell it to show the differences purely with the index file, and ignore |
||||
the current working directory state entirely. Since we just wrote the |
||||
index file to HEAD, doing "git-diff-cache --cached -p HEAD" should thus |
||||
return an empty set of differences, and that's exactly what it does. |
||||
|
||||
However, our next step is to commit the _change_ we did, and again, to |
||||
understand what's going on, keep in mind the difference between "workign |
||||
directory contents", "index file" and "committed tree". We have changes |
||||
in the working directory that we want to commit, and we always have to |
||||
work through the index file, so the first thing we need to do is to |
||||
update the index cache: |
||||
|
||||
git-update-cache a |
||||
|
||||
(note how we didn't need the "--add" flag this time, since git knew |
||||
about the file already). |
||||
|
||||
Note what happens to the different git-diff-xxx versions here. After |
||||
we've updated "a" in the index, "git-diff-files -p" now shows no |
||||
differences, but "git-diff-cache -p HEAD" still _does_ show that the |
||||
current state is different from the state we committed. In fact, now |
||||
"git-diff-cache" shows the same difference whether we use the "--cached" |
||||
flag or not, since now the index is coherent with the working directory. |
||||
|
||||
Now, since we've updated "a" in the index, we can commit the new |
||||
version. We could do it by writing the tree by hand, and committing the |
||||
tree (this time we'd have to use the "-p HEAD" flag to tell commit that |
||||
the HEAD was the _parent_ fo the new commit, and that this wasn't an |
||||
initial commit any more), but the fact is, git has a simple helper |
||||
script for doing all of the non-initial commits that does all of this |
||||
for you, and starts up an editor to let you write your commit message |
||||
yourself, so let's just use that: |
||||
|
||||
git-commit-script |
||||
|
||||
Write whatever message you want, and all the lines that start with '#' |
||||
will be pruned out, and the rest will be used as the commit message for |
||||
the change. If you decide you don't want to commit anything after all at |
||||
this point (you can continue to edit things and update the cache), you |
||||
can just leave an empty message. Otherwise git-commit-script will commit |
||||
the change for you. |
||||
|
||||
(Btw, current versions of git will consider the change in question to be |
||||
so big that it's considered a whole new file, since the diff is actually |
||||
bigger than the file. So the helpful comments that git-commit-script |
||||
tells you for this example will say that you deleted and re-created the |
||||
file "a". For a less contrieved example, these things are usually more |
||||
obvious). |
||||
|
||||
You've now made your first real git commit. And if you're interested in |
||||
looking at what git-commit-script really does, feel free to investigate: |
||||
it's a few very simple shell scripts to generate the helpful (?) commit |
||||
message headers, and a few one-liners that actually do the commit itself. |
||||
|
||||
|
||||
Checking it out |
||||
--------------- |
||||
|
||||
While creating changes is useful, it's even more useful if you can tell |
||||
later what changed. The most useful command for this is another of the |
||||
"diff" family, namely "git-diff-tree". |
||||
|
||||
git-diff-tree can be given two arbitrary trees, and it will tell you the |
||||
differences between them. Perhaps even more commonly, though, you can |
||||
give it just a single commit object, and it will figure out the parent |
||||
of that commit itself, and show the difference directly. Thus, to get |
||||
the same diff that we've already seen several times, we can now do |
||||
|
||||
git-diff-tree -p HEAD |
||||
|
||||
(again, "-p" means to show the difference as a human-readable patch), |
||||
and it will show what the last commit (in HEAD) actually changed. |
||||
|
||||
More interestingly, you can also give git-diff-tree the "-v" flag, which |
||||
tells it to also show the commit message and author and date of the |
||||
commit, and you can tell it to show a whole series of diffs. |
||||
Alternatively, you can tell it to be "silent", and not show the diffs at |
||||
all, but just show the actual commit message. |
||||
|
||||
In fact, together with the "git-rev-list" program (which generates a |
||||
list of revisions), git-diff-tree ends up being a veritable fount of |
||||
changes. A trivial (but very useful) script called "git-whatchanged" is |
||||
included with git which does exactly this, and shows a log of recent |
||||
activity. |
||||
|
||||
To see the whole history of our pitiful little git-tutorial project, we |
||||
can do |
||||
|
||||
git-whatchanged -p --root HEAD |
||||
|
||||
(the "--root" flag is a flag to git-diff-tree to tell it to show the |
||||
initial aka "root" commit as a diff too), and you will see exactly what |
||||
has changed in the repository over its short history. |
||||
|
||||
With that, you should now be having some incling of what git does, and |
||||
can explore on your own. |
||||
|
||||
[ to be continued.. cvs2git, tagging versions, branches, merging.. ] |
Loading…
Reference in new issue