585 lines
19 KiB

A tutorial introduction to git
==============================
This tutorial explains how to import a new project into git, make
changes to it, and share changes with other developers.
First, note that you can get documentation for a command such as "git
diff" with:
------------------------------------------------
$ man git-diff
------------------------------------------------
It is a good idea to introduce yourself to git with your name and
public email address before doing any operation. The easiest
way to do so is:
------------------------------------------------
$ git config --global user.name "Your Name Comes Here"
$ git config --global user.email you@yourdomain.example.com
------------------------------------------------
Importing a new project
-----------------------
Assume you have a tarball project.tar.gz with your initial work. You
can place it under git revision control as follows.
------------------------------------------------
$ tar xzf project.tar.gz
$ cd project
$ git init
------------------------------------------------
Git will reply
------------------------------------------------
Initialized empty Git repository in .git/
------------------------------------------------
You've now initialized the working directory--you may notice a new
directory created, named ".git". Tell git that you want it to track
every file under the current directory (note the '.') with:
------------------------------------------------
$ git add .
------------------------------------------------
Finally,
------------------------------------------------
$ git commit
------------------------------------------------
will prompt you for a commit message, then record the current state
of all the files to the repository.
Making changes
--------------
Try modifying some files, then run
------------------------------------------------
$ git diff
------------------------------------------------
to review your changes. When you're done, tell git that you
want the updated contents of these files in the commit and then
make a commit, like this:
------------------------------------------------
$ git add file1 file2 file3
$ git commit
------------------------------------------------
This will again prompt your for a message describing the change, and then
record the new versions of the files you listed.
Alternatively, instead of running `git add` beforehand, you can use
------------------------------------------------
$ git commit -a
------------------------------------------------
which will automatically notice modified (but not new) files.
A note on commit messages: Though not required, it's a good idea to
begin the commit message with a single short (less than 50 character)
line summarizing the change, followed by a blank line and then a more
thorough description. Tools that turn commits into email, for
example, use the first line on the Subject: line and the rest of the
commit in the body.
Git tracks content not files
----------------------------
With git you have to explicitly "add" all the changed _content_ you
want to commit together. This can be done in a few different ways:
1) By using 'git add <file_spec>...'
This can be performed multiple times before a commit. Note that this
is not only for adding new files. Even modified files must be
added to the set of changes about to be committed. The "git status"
command gives you a summary of what is included so far for the
next commit. When done you should use the 'git commit' command to
make it real.
Note: don't forget to 'add' a file again if you modified it after the
first 'add' and before 'commit'. Otherwise only the previous added
state of that file will be committed. This is because git tracks
content, so what you're really 'adding' to the commit is the *content*
of the file in the state it is in when you 'add' it.
2) By using 'git commit -a' directly
This is a quick way to automatically 'add' the content from all files
that were modified since the previous commit, and perform the actual
commit without having to separately 'add' them beforehand. This will
not add content from new files i.e. files that were never added before.
Those files still have to be added explicitly before performing a
commit.
But here's a twist. If you do 'git commit <file1> <file2> ...' then only
the changes belonging to those explicitly specified files will be
committed, entirely bypassing the current "added" changes. Those "added"
changes will still remain available for a subsequent commit though.
However, for normal usage you only have to remember 'git add' + 'git commit'
and/or 'git commit -a'.
Viewing the changelog
---------------------
At any point you can view the history of your changes using
------------------------------------------------
$ git log
------------------------------------------------
If you also want to see complete diffs at each step, use
------------------------------------------------
$ git log -p
------------------------------------------------
Often the overview of the change is useful to get a feel of
each step
------------------------------------------------
$ git log --stat --summary
------------------------------------------------
Managing branches
-----------------
A single git repository can maintain multiple branches of
development. To create a new branch named "experimental", use
------------------------------------------------
$ git branch experimental
------------------------------------------------
If you now run
------------------------------------------------
$ git branch
------------------------------------------------
you'll get a list of all existing branches:
------------------------------------------------
experimental
* master
------------------------------------------------
The "experimental" branch is the one you just created, and the
"master" branch is a default branch that was created for you
automatically. The asterisk marks the branch you are currently on;
type
------------------------------------------------
$ git checkout experimental
------------------------------------------------
to switch to the experimental branch. Now edit a file, commit the
change, and switch back to the master branch:
------------------------------------------------
(edit file)
$ git commit -a
$ git checkout master
------------------------------------------------
Check that the change you made is no longer visible, since it was
made on the experimental branch and you're back on the master branch.
You can make a different change on the master branch:
------------------------------------------------
(edit file)
$ git commit -a
------------------------------------------------
at this point the two branches have diverged, with different changes
made in each. To merge the changes made in experimental into master, run
------------------------------------------------
$ git merge experimental
------------------------------------------------
If the changes don't conflict, you're done. If there are conflicts,
markers will be left in the problematic files showing the conflict;
------------------------------------------------
$ git diff
------------------------------------------------
will show this. Once you've edited the files to resolve the
conflicts,
------------------------------------------------
$ git commit -a
------------------------------------------------
will commit the result of the merge. Finally,
------------------------------------------------
$ gitk
------------------------------------------------
will show a nice graphical representation of the resulting history.
At this point you could delete the experimental branch with
------------------------------------------------
$ git branch -d experimental
------------------------------------------------
This command ensures that the changes in the experimental branch are
already in the current branch.
If you develop on a branch crazy-idea, then regret it, you can always
delete the branch with
-------------------------------------
$ git branch -D crazy-idea
-------------------------------------
Branches are cheap and easy, so this is a good way to try something
out.
Using git for collaboration
---------------------------
Suppose that Alice has started a new project with a git repository in
/home/alice/project, and that Bob, who has a home directory on the
same machine, wants to contribute.
Bob begins with:
------------------------------------------------
$ git clone /home/alice/project myrepo
------------------------------------------------
This creates a new directory "myrepo" containing a clone of Alice's
repository. The clone is on an equal footing with the original
project, possessing its own copy of the original project's history.
Bob then makes some changes and commits them:
------------------------------------------------
(edit files)
$ git commit -a
(repeat as necessary)
------------------------------------------------
When he's ready, he tells Alice to pull changes from the repository
at /home/bob/myrepo. She does this with:
------------------------------------------------
$ cd /home/alice/project
$ git pull /home/bob/myrepo master
------------------------------------------------
This merges the changes from Bob's "master" branch into Alice's
current branch. If Alice has made her own changes in the meantime,
then she may need to manually fix any conflicts. (Note that the
"master" argument in the above command is actually unnecessary, as it
is the default.)
The "pull" command thus performs two operations: it fetches changes
from a remote branch, then merges them into the current branch.
When you are working in a small closely knit group, it is not
unusual to interact with the same repository over and over
again. By defining 'remote' repository shorthand, you can make
it easier:
------------------------------------------------
$ git remote add bob /home/bob/myrepo
------------------------------------------------
With this, you can perform the first operation alone using the
"git fetch" command without merging them with her own branch,
using:
-------------------------------------
$ git fetch bob
-------------------------------------
Unlike the longhand form, when Alice fetches from Bob using a
remote repository shorthand set up with `git remote`, what was
fetched is stored in a remote tracking branch, in this case
`bob/master`. So after this:
-------------------------------------
$ git log -p master..bob/master
-------------------------------------
shows a list of all the changes that Bob made since he branched from
Alice's master branch.
After examining those changes, Alice
could merge the changes into her master branch:
-------------------------------------
$ git merge bob/master
-------------------------------------
This `merge` can also be done by 'pulling from her own remote
tracking branch', like this:
-------------------------------------
$ git pull . remotes/bob/master
-------------------------------------
Note that git pull always merges into the current branch,
regardless of what else is given on the commandline.
Later, Bob can update his repo with Alice's latest changes using
-------------------------------------
$ git pull
-------------------------------------
Note that he doesn't need to give the path to Alice's repository;
when Bob cloned Alice's repository, git stored the location of her
repository in the repository configuration, and that location is
used for pulls:
-------------------------------------
$ git config --get remote.origin.url
/home/bob/myrepo
-------------------------------------
(The complete configuration created by git-clone is visible using
"git config -l", and the gitlink:git-config[1] man page
explains the meaning of each option.)
Git also keeps a pristine copy of Alice's master branch under the
name "origin/master":
-------------------------------------
$ git branch -r
origin/master
-------------------------------------
If Bob later decides to work from a different host, he can still
perform clones and pulls using the ssh protocol:
-------------------------------------
$ git clone alice.org:/home/alice/project myrepo
-------------------------------------
Alternatively, git has a native protocol, or can use rsync or http;
see gitlink:git-pull[1] for details.
Git can also be used in a CVS-like mode, with a central repository
that various users push changes to; see gitlink:git-push[1] and
link:cvs-migration.html[git for CVS users].
Exploring history
-----------------
Git history is represented as a series of interrelated commits. We
have already seen that the git log command can list those commits.
Note that first line of each git log entry also gives a name for the
commit:
-------------------------------------
$ git log
commit c82a22c39cbc32576f64f5c6b3f24b99ea8149c7
Author: Junio C Hamano <junkio@cox.net>
Date: Tue May 16 17:18:22 2006 -0700
merge-base: Clarify the comments on post processing.
-------------------------------------
We can give this name to git show to see the details about this
commit.
-------------------------------------
$ git show c82a22c39cbc32576f64f5c6b3f24b99ea8149c7
-------------------------------------
But there are other ways to refer to commits. You can use any initial
part of the name that is long enough to uniquely identify the commit:
-------------------------------------
$ git show c82a22c39c # the first few characters of the name are
# usually enough
$ git show HEAD # the tip of the current branch
$ git show experimental # the tip of the "experimental" branch
-------------------------------------
Every commit usually has one "parent" commit
which points to the previous state of the project:
-------------------------------------
$ git show HEAD^ # to see the parent of HEAD
$ git show HEAD^^ # to see the grandparent of HEAD
$ git show HEAD~4 # to see the great-great grandparent of HEAD
-------------------------------------
Note that merge commits may have more than one parent:
-------------------------------------
$ git show HEAD^1 # show the first parent of HEAD (same as HEAD^)
$ git show HEAD^2 # show the second parent of HEAD
-------------------------------------
You can also give commits names of your own; after running
-------------------------------------
$ git-tag v2.5 1b2e1d63ff
-------------------------------------
you can refer to 1b2e1d63ff by the name "v2.5". If you intend to
share this name with other people (for example, to identify a release
version), you should create a "tag" object, and perhaps sign it; see
gitlink:git-tag[1] for details.
Any git command that needs to know a commit can take any of these
names. For example:
-------------------------------------
$ git diff v2.5 HEAD # compare the current HEAD to v2.5
$ git branch stable v2.5 # start a new branch named "stable" based
# at v2.5
$ git reset --hard HEAD^ # reset your current branch and working
# directory to its state at HEAD^
-------------------------------------
Be careful with that last command: in addition to losing any changes
in the working directory, it will also remove all later commits from
this branch. If this branch is the only branch containing those
Why is it bad to rewind a branch that has already been pushed out? I was reading the tutorial and noticed that we say this: Also, don't use "git reset" on a publicly-visible branch that other developers pull from, as git will be confused by history that disappears in this way. I do not think this is a good explanation. For example, if we do this: (1) I build a series and push it out. ---o---o---o---j (2) Alice clones from me, and builds two commits on top of it. ---o---o---o---j---a---a (3) I rewind one and build a few, and push them out. ---o---o---o...j \ h---h---h---h (4) Alice pulls from me again: ---o---o---o---j---a---a---* \ / h---h---h---h Contrary to the description, git will happily have Alice merge between the two branches, and never gets confused. Maybe I did not want to have 'j' because it was an incomplete solution to some problem, and Alice may have fixed it up with her changes, while I abandoned that approach I started with 'j', and worked on something completely unrelated in the four 'h' commits. In such a case, the merge Alice would make would be very sensible, and after she makes the merge if I pull from her, the world will be perfect. I started something with 'j' and dropped the ball, Alice picked it up and perfected it while I went on to work on something else with 'h'. This would be a perfect example of distributed parallel collaboration. There is nothing confused about it. The case the rewinding becomes problematic is if the work done in 'h' tries to solve the same problem as 'j' tried to solve in a different way. Then the merge forced on Alice would make her pick between my previous attempt with her fixups (j+a) and my second attempt (h). If 'a' commits were to fix up what 'j' started, presumably Alice already studied and knows enough about the problem so she should be able to make an informed decision to pick between what 'j+a' and 'h' do. A lot worse case is if Alice's work is not at all related to what 'j' wanted to do (she did not mean to pick up from where I left off -- she just wanted to work on something different). Then she would not be familiar enough with what 'j' and 'h' tried to achieve, and I'd be forcing her to pick between the two. Of course if she can make the right decision, then again that is a perfect example of distributed collaboration, but that does not change the fact that I'd be forcing her to clean up my mess. Signed-off-by: Junio C Hamano <junkio@cox.net>
18 years ago
commits, they will be lost. Also, don't use "git reset" on a
publicly-visible branch that other developers pull from, as it will
force needless merges on other developers to clean up the history.
If you need to undo changes that you have pushed, use gitlink:git-revert[1]
instead.
The git grep command can search for strings in any version of your
project, so
-------------------------------------
$ git grep "hello" v2.5
-------------------------------------
searches for all occurrences of "hello" in v2.5.
If you leave out the commit name, git grep will search any of the
files it manages in your current directory. So
-------------------------------------
$ git grep "hello"
-------------------------------------
is a quick way to search just the files that are tracked by git.
Many git commands also take sets of commits, which can be specified
in a number of ways. Here are some examples with git log:
-------------------------------------
$ git log v2.5..v2.6 # commits between v2.5 and v2.6
$ git log v2.5.. # commits since v2.5
$ git log --since="2 weeks ago" # commits from the last 2 weeks
$ git log v2.5.. Makefile # commits since v2.5 which modify
# Makefile
-------------------------------------
You can also give git log a "range" of commits where the first is not
necessarily an ancestor of the second; for example, if the tips of
the branches "stable-release" and "master" diverged from a common
commit some time ago, then
-------------------------------------
$ git log stable..experimental
-------------------------------------
will list commits made in the experimental branch but not in the
stable branch, while
-------------------------------------
$ git log experimental..stable
-------------------------------------
will show the list of commits made on the stable branch but not
the experimental branch.
The "git log" command has a weakness: it must present commits in a
list. When the history has lines of development that diverged and
then merged back together, the order in which "git log" presents
those commits is meaningless.
Most projects with multiple contributors (such as the linux kernel,
or git itself) have frequent merges, and gitk does a better job of
visualizing their history. For example,
-------------------------------------
$ gitk --since="2 weeks ago" drivers/
-------------------------------------
allows you to browse any commits from the last 2 weeks of commits
that modified files under the "drivers" directory. (Note: you can
adjust gitk's fonts by holding down the control key while pressing
"-" or "+".)
Finally, most commands that take filenames will optionally allow you
to precede any filename by a commit, to specify a particular version
of the file:
-------------------------------------
$ git diff v2.5:Makefile HEAD:Makefile.in
-------------------------------------
You can also use "git show" to see any such file:
-------------------------------------
$ git show v2.5:Makefile
-------------------------------------
Next Steps
----------
This tutorial should be enough to perform basic distributed revision
control for your projects. However, to fully understand the depth
and power of git you need to understand two simple ideas on which it
is based:
* The object database is the rather elegant system used to
store the history of your project--files, directories, and
commits.
* The index file is a cache of the state of a directory tree,
used to create commits, check out working directories, and
hold the various trees involved in a merge.
link:tutorial-2.html[Part two of this tutorial] explains the object
database, the index file, and a few other odds and ends that you'll
need to make the most of git.
If you don't want to consider with that right away, a few other
digressions that may be interesting at this point are:
* gitlink:git-format-patch[1], gitlink:git-am[1]: These convert
series of git commits into emailed patches, and vice versa,
useful for projects such as the linux kernel which rely heavily
on emailed patches.
* gitlink:git-bisect[1]: When there is a regression in your
project, one way to track down the bug is by searching through
the history to find the exact commit that's to blame. Git bisect
can help you perform a binary search for that commit. It is
smart enough to perform a close-to-optimal search even in the
case of complex non-linear history with lots of merged branches.
* link:everyday.html[Everyday GIT with 20 Commands Or So]
* link:cvs-migration.html[git for CVS users].