This creates a simple specialized object allocator for basic
objects.
This avoids wasting space with malloc overhead (metadata and
extra alignment), since the specialized allocator knows the
alignment, and that objects, once allocated, are never freed.
It also allows us to track some basic statistics about object
allocations. For example, for the mozilla import, it shows
object usage as follows:
blobs: 627629 (14710 kB)
trees: 1119035 (34969 kB)
commits: 196423 (8440 kB)
tags: 1336 (46 kB)
and the simpler allocator shaves off about 2.5% off the memory
footprint off a "git-rev-list --all --objects", and is a bit
faster too.
[ Side note: this concludes the series of "save memory in object storage".
The thing is, there simply isn't much more to be saved on the objects.
Doing "git-rev-list --all --objects" on the mozilla archive has a final
total RSS of 131498 pages for me: that's about 513MB. Of that, the
object overhead is now just 56MB, the rest is going somewhere else (put
another way: the fact that this patch shaves off 2.5% of the total
memory overhead, considering that objects are now not much more than 10%
of the total shows how big the wasted space really was: this makes
object allocations much more memory- and time-efficient).
I haven't looked at where the rest is, but I suspect the bulk of it is
just the pack-file loading. It may be that we should pack the tree
objects separately from the blob objects: for git-rev-list --objects, we
don't actually ever need to even look at the blobs, but since trees and
blobs are interspersed in the pack-file, we end up not being dense in
the tree accesses, so we end up looking at more pages than we strictly
need to.
So with a 535MB pack-file, it's entirely possible - even likely - that
most of the remaining RSS is just the mmap of the pack-file itself. We
don't need to map in _all_ of it, but we do end up mapping a fair
amount. ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Every single user actually wanted this only for commit objects, and we
have no reason to waste space on it for other object types. So just move
the structure member from the low-level "struct object" into the "struct
commit".
This leaves the commit object the same size, and removes one unnecessary
pointer from all other object allocations.
This shrinks memory usage (still at a fairly hefty half-gig, admittedly)
of "git-rev-list --all --objects" on the mozilla repo by another 5% in my
tests.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.
In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.
Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.
This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.
There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.
Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When --attach is not used, usually we do not say Content-Type:
and fluff, but if the commit message is not 7-bit ASCII, mark
it as "text/plain; charset=UTF-8". This unclutters output
somewhat.
Signed-off-by: Junio C Hamano <junkio@cox.net>
By convention, the commit message and the author/committer names
in the commit objects are UTF-8 encoded. When formatting for
e-mails, Q-encode them according to RFC 2047.
While we are at it, generate the content-type and
content-transfer-encoding headers as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch touches a couple of files, because it adds options to print a
custom text just after the subject of a commit, and just after the
diffstat.
[jc: made "many dashes" used as the boundary leader into a single
variable, to reduce the possibility of later tweaks to miscount the
number of dashes to break it.]
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Still Work-in-progress git fmt-patch (should it be known as
format-patch-ng?) is matched with the fix made by Huw Davies
in 262a6ef76a commit to use
RFC2822 date format.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Updating "subject" variable without changing the hardcoded
number of bytes to memcpy from it would not help much.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This only does --stdout right now. To write into separate files
with pretty-printed filenames like the real thing does, it needs
a bit mroe work.
Signed-off-by: Junio C Hamano <junkio@cox.net>
In addition to the existing comment support, that just allows the user
to use a convention that works pretty much everywhere else.
Signed-off-by: Yann Dirson <ydirson@altern.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Partly because we've messed up and now have some commits with trailing
whitespace, but partly because this also just simplifies the code, let's
remove trailing whitespace from the end when pretty-printing commits.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds --date-order to rev-list; it is similar to topo order
in the sense that no parent comes before all of its children,
but otherwise things are still ordered in the commit timestamp
order.
The same flag is also added to show-branch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Earlier, f2d4227530 commit broke Merge:
lines for unabbreviated case. Do not emit extra dots if we do not
abbreviate.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When displaying Merge: lines, we used to take the real commit
parents from the commit objects. Use the parsed parents from
the commit object instead, so that we honor fake parent
information from info/grafts.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When describing more than one, we need to clear the commit marks
before handling the next one, but most of the time we are
running it for only one commit, and in such a case this clearing
phase is totally unnecessary.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The main loop was prepared to take more than one revs, but the actual
naming logic wad not (it used pop_most_recent_commit while forgetting
that the commit marks stay after it's done).
Signed-off-by: Junio C Hamano <junkio@cox.net>
ISO C99 (and GCC 3.x or later) lets you write a flexible array
at the end of a structure, like this:
struct frotz {
int xyzzy;
char nitfol[]; /* more */
};
GCC 2.95 and 2.96 let you to do this with "char nitfol[0]";
unfortunately this is not allowed by ISO C90.
This declares such construct like this:
struct frotz {
int xyzzy;
char nitfol[FLEX_ARRAY]; /* more */
};
and git-compat-util.h defines FLEX_ARRAY to 0 for gcc 2.95 and
empty for others.
If you are using a C90 C compiler, you should be able
to override this with CFLAGS=-DFLEX_ARRAY=1 from the
command line of "make".
Signed-off-by: Junio C Hamano <junkio@cox.net>
dietlibc versions of malloc, calloc and realloc all return NULL if
they're told to allocate 0 bytes, causes the x* wrappers to die().
There are several more places where these calls could end up asking
for 0 bytes, too...
Maybe simply not die()-ing in the x* wrappers if 0/NULL is returned
when the requested size is zero is a safer and easier way to go.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Store pointers to referenced objects in a variable sized array instead
of linked list. This cuts down memory usage of utilities which use
object references; e.g., git-fsck-objects --full on the git.git
repository consumes about 2 MB of memory tracked by Massif instead of
7 MB before the change. Object refs are still the biggest consumer of
memory (57%), but the malloc overhead for a single block instead of a
linked list is substantially smaller.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This fixes git-rev-list so that when there are multiple branches, we still
sort the heads in proper approximate date order even when sorting the
output topologically.
This makes things like
gitk --all -d
work sanely and show the branches in date order (where "date order" is
obviously modified by the paren-child dependency requirements of the
topological sort).
The trivial fix is to just build the "work" list in date order rather than
inserting the new work entries at the beginning.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
git log without --pretty showed author and author-date, while
with --pretty=full showed author and committer but no dates.
The new formatting option, --pretty=fuller, shows both name and
timestamp for author and committer.
Signed-off-by: Junio C Hamano <junkio@cox.net>
One caller of deref_tag() was not careful enough to make sure
what deref_tag() returned was not NULL (i.e. we found a tag
object that points at an object we do not have). Fix it, and
warn about refs that point at such an incomplete tag where
needed.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Do our own ctype.h, just to get the sane semantics: we want
locale-independence, _and_ we want the right signed behaviour. Plus we
only use a very small subset of ctype.h anyway (isspace, isalpha,
isdigit and isalnum).
Signed-off-by: Junio C Hamano <junkio@cox.net>
As pointed out on the list, git-rev-list can use a lot of memory.
One low-hanging fruit is to free the commit buffer for commits that we
parse. By default, parse_commit() will save away the buffer, since a lot
of cases do want it, and re-reading it continually would be unnecessary.
However, in many cases the buffer isn't actually necessary and saving it
just wastes memory.
We could just free the buffer ourselves, but especially in git-rev-list,
we actually end up using the helper functions that automatically add
parent commits to the commit lists, so we don't actually control the
commit parsing directly.
Instead, just make this behaviour of "parse_commit()" a global flag.
Maybe this is a bit tasteless, but it's very simple, and it makes a
noticable difference in memory usage.
Before the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.02system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+3714minor)pagefaults 0swaps
after the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.00system 0:00.27elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2433minor)pagefaults 0swaps
note how the minor faults have decreased from 3714 pages to 2433 pages.
That's all due to the fewer anonymous pages allocated to hold the comment
buffers and their metadata.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This reverts 6c5f9baa3b commit, whose
change breaks gcc-2.95.
Not that I ignore portability to compilers that are properly C99, but
keeping compilation with GCC working is more important, at least for
now. We would probably end up declaring with "name[1]" and teach the
allocator to subtract one if we really aimed for portability, but that
is left for later rounds.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The 'git show-branches' command turns out to be reasonably useful,
but painfully slow. So rewrite it in C, using ideas from merge-base
while enhancing it a bit more.
- Unlike show-branches, it can take --heads (show me all my
heads), --tags (show me all my tags), or --all (both).
- It can take --more=<number> to show beyond the merge-base.
- It shows the short name for each commit in the extended SHA1
syntax.
- It can find merge-base for more than two heads.
Examples:
$ git show-branch --more=6 HEAD
is almost the same as "git log --pretty=oneline --max-count=6".
$ git show-branch --merge-base master mhf misc
finds the merge base of the three given heads.
$ git show-branch master mhf misc
shows logs from the top of these three branch heads, up to their
common ancestor commit is shown.
$ git show-branch --all --more=10
is poor-man's gitk, showing all the tags and heads, and
going back 10 commits beyond the merge base of those refs.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This introduces --pretty=oneline to git-rev-tree and
git-rev-list commands to show only the first line of the commit
message, without frills.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Again I left the v2.6.11-tree tag behind. My bad.
This commit makes sure that we do not barf when pushing a ref
that is a non-commitish tag. You can update a remote ref under
the following conditions:
* You can always use --force.
* Creating a brand new ref is OK.
* If the remote ref is exactly the same as what you are
pushing, it is OK (nothing is pushed).
* You can replace a commitish with another commitish which is a
descendant of it, if you can verify the ancestry between them;
this and the above means you have to have what you are replacing.
* Otherwise you cannot update; you need to use --force.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Introduce a new file $GIT_DIR/info/grafts (or $GIT_GRAFT_FILE)
which is a list of "fake commit parent records". Each line of
this file is a commit ID, followed by parent commit IDs, all
40-byte hex SHA1 separated by a single SP in between. The
records override the parent information we would normally read
from the commit objects, allowing both adding "fake" parents
(i.e. grafting), and pretending as if a commit is not a child of
some of its real parents (i.e. cauterizing).
Signed-off-by: Junio C Hamano <junkio@cox.net>
This was brought on by a bad tree of Thomas Gleixner, where some bogus
commit objects weren't warned about properly
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we allow a tag object in place of a commit object, we only
dereferenced the given tag once, which causes a tag that points at a tag
that points at a commit to be rejected. Instead, dereference tag
repeatedly until we get a non-tag.
This patch makes change to two functions:
- commit.c::lookup_commit_reference() is used by merge-base,
rev-tree and rev-parse to convert user supplied SHA1 to that of
a commit.
- rev-list uses its own get_commit_reference() to do the same.
Dereferencing tags this way helps both of these uses.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces an in-place topological sort procedure to commit.c.
Given a list of commits, sort_in_topological_order() will perform an in-place
topological sort of that list.
The invariant that applies to the resulting list is:
a reachable from b => ord(b) < ord(a)
This invariant is weaker than the --merge-order invariant, but is cheaper
to calculate (assuming the list has been identified) and will serve any
purpose where only a minimal topological order guarantee is required.
Signed-off-by: Jon Seymour <jon.seymour@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>