|
|
|
#ifndef OBJECT_H
|
|
|
|
#define OBJECT_H
|
|
|
|
|
|
|
|
struct object_list {
|
|
|
|
struct object *item;
|
|
|
|
struct object_list *next;
|
|
|
|
};
|
|
|
|
|
Add "named object array" concept
We've had this notion of a "object_list" for a long time, which eventually
grew a "name" member because some users (notably git-rev-list) wanted to
name each object as it is generated.
That object_list is great for some things, but it isn't all that wonderful
for others, and the "name" member is generally not used by everybody.
This patch splits the users of the object_list array up into two: the
traditional list users, who want the list-like format, and who don't
actually use or want the name. And another class of users that really used
the list as an extensible array, and generally wanted to name the objects.
The patch is fairly straightforward, but it's also biggish. Most of it
really just cleans things up: switching the revision parsing and listing
over to the array makes things like the builtin-diff usage much simpler
(we now see exactly how many members the array has, and we don't get the
objects reversed from the order they were on the command line).
One of the main reasons for doing this at all is that the malloc overhead
of the simple object list was actually pretty high, and the array is just
a lot denser. So this patch brings down memory usage by git-rev-list by
just under 3% (on top of all the other memory use optimizations) on the
mozilla archive.
It does add more lines than it removes, and more importantly, it adds a
whole new infrastructure for maintaining lists of objects, but on the
other hand, the new dynamic array code is pretty obvious. The change to
builtin-diff-tree.c shows a fairly good example of why an array interface
is sometimes more natural, and just much simpler for everybody.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
19 years ago
|
|
|
struct object_array {
|
|
|
|
unsigned int nr;
|
|
|
|
unsigned int alloc;
|
|
|
|
struct object_array_entry {
|
|
|
|
struct object *item;
|
object_array_entry: fix memory handling of the name field
Previously, the memory management of the object_array_entry::name
field was inconsistent and undocumented. object_array_entries are
ultimately created by a single function, add_object_array_with_mode(),
which has an argument "const char *name". This function used to
simply set the name field to reference the string pointed to by the
name parameter, and nobody on the object_array side ever freed the
memory. Thus, it assumed that the memory for the name field would be
managed by the caller, and that the lifetime of that string would be
at least as long as the lifetime of the object_array_entry. But
callers were inconsistent:
* Some passed pointers to constant strings or argv entries, which was
OK.
* Some passed pointers to newly-allocated memory, but didn't arrange
for the memory ever to be freed.
* Some passed the return value of sha1_to_hex(), which is a pointer to
a statically-allocated buffer that can be overwritten at any time.
* Some passed pointers to refnames that they received from a
for_each_ref()-type iteration, but the lifetimes of such refnames is
not guaranteed by the refs API.
Bring consistency to this mess by changing object_array to make its
own copy for the object_array_entry::name field and free this memory
when an object_array_entry is deleted from the array.
Many callers were passing the empty string as the name parameter, so
as a performance optimization, treat the empty string specially.
Instead of making a copy, store a pointer to a statically-allocated
empty string to object_array_entry::name. When deleting such an
entry, skip the free().
Change the callers that were already passing copies to
add_object_array_with_mode() to either skip the copy, or (if the
memory needed to be allocated anyway) freeing the memory itself.
A part of this commit effectively reverts
70d26c6e76 read_revisions_from_stdin: make copies for handle_revision_arg
because the copying introduced by that commit (which is still
necessary) is now done at a deeper level.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
12 years ago
|
|
|
/*
|
|
|
|
* name or NULL. If non-NULL, the memory pointed to
|
|
|
|
* is owned by this object *except* if it points at
|
|
|
|
* object_array_slopbuf, which is a static copy of the
|
|
|
|
* empty string.
|
|
|
|
*/
|
|
|
|
char *name;
|
|
|
|
char *path;
|
|
|
|
unsigned mode;
|
Add "named object array" concept
We've had this notion of a "object_list" for a long time, which eventually
grew a "name" member because some users (notably git-rev-list) wanted to
name each object as it is generated.
That object_list is great for some things, but it isn't all that wonderful
for others, and the "name" member is generally not used by everybody.
This patch splits the users of the object_list array up into two: the
traditional list users, who want the list-like format, and who don't
actually use or want the name. And another class of users that really used
the list as an extensible array, and generally wanted to name the objects.
The patch is fairly straightforward, but it's also biggish. Most of it
really just cleans things up: switching the revision parsing and listing
over to the array makes things like the builtin-diff usage much simpler
(we now see exactly how many members the array has, and we don't get the
objects reversed from the order they were on the command line).
One of the main reasons for doing this at all is that the malloc overhead
of the simple object list was actually pretty high, and the array is just
a lot denser. So this patch brings down memory usage by git-rev-list by
just under 3% (on top of all the other memory use optimizations) on the
mozilla archive.
It does add more lines than it removes, and more importantly, it adds a
whole new infrastructure for maintaining lists of objects, but on the
other hand, the new dynamic array code is pretty obvious. The change to
builtin-diff-tree.c shows a fairly good example of why an array interface
is sometimes more natural, and just much simpler for everybody.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
19 years ago
|
|
|
} *objects;
|
|
|
|
};
|
|
|
|
|
|
|
|
#define OBJECT_ARRAY_INIT { 0, 0, NULL }
|
|
|
|
|
Shrink "struct object" a bit
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.
In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.
Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.
This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.
There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.
Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
19 years ago
|
|
|
#define TYPE_BITS 3
|
|
|
|
/*
|
|
|
|
* object flag allocation:
|
|
|
|
* revision.h: 0---------10 26
|
fetch-pack: cache results of for_each_alternate_ref
We may run for_each_alternate_ref() twice, once in
find_common() and once in everything_local(). This operation
can be expensive, because it involves running a sub-process
which must freshly load all of the alternate's refs from
disk.
Let's cache and reuse the results between the two calls. We
can make some optimizations based on the particular use
pattern in fetch-pack to keep our memory usage down.
The first is that we only care about the sha1s, not the refs
themselves. So it's OK to store only the sha1s, and to
suppress duplicates. The natural fit would therefore be a
sha1_array.
However, sha1_array's de-duplication happens only after it
has read and sorted all entries. It still stores each
duplicate. For an alternate with a large number of refs
pointing to the same commits, this is a needless expense.
Instead, we'd prefer to eliminate duplicates before putting
them in the cache, which implies using a hash. We can
further note that fetch-pack will call parse_object() on
each alternate sha1. We can therefore keep our cache as a
set of pointers to "struct object". That gives us a place to
put our "already seen" bit with an optimized hash lookup.
And as a bonus, the object stores the sha1 for us, so
pointer-to-object is all we need.
There are two extra optimizations I didn't do here:
- we actually store an array of pointer-to-object.
Technically we could just walk the obj_hash table
looking for entries with the ALTERNATE flag set (because
our use case doesn't care about the order here).
But that hash table may be mostly composed of
non-ALTERNATE entries, so we'd waste time walking over
them. So it would be a slight win in memory use, but a
loss in CPU.
- the items we pull out of the cache are actual "struct
object"s, but then we feed "obj->sha1" to our
sub-functions, which promptly call parse_object().
This second parse is cheap, because it starts with
lookup_object() and will bail immediately when it sees
we've already parsed the object. We could save the extra
hash lookup, but it would involve refactoring the
functions we call. It may or may not be worth the
trouble.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
8 years ago
|
|
|
* fetch-pack.c: 0---5
|
|
|
|
* walker.c: 0-2
|
|
|
|
* upload-pack.c: 4 11----------------19
|
|
|
|
* builtin/blame.c: 12-13
|
|
|
|
* bisect.c: 16
|
|
|
|
* bundle.c: 16
|
|
|
|
* http-push.c: 16-----19
|
|
|
|
* commit.c: 16-----19
|
|
|
|
* sha1_name.c: 20
|
|
|
|
*/
|
Shrink "struct object" a bit
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.
In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.
Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.
This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.
There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.
Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
19 years ago
|
|
|
#define FLAG_BITS 27
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The object type is stored in 3 bits.
|
|
|
|
*/
|
|
|
|
struct object {
|
|
|
|
unsigned parsed : 1;
|
|
|
|
unsigned used : 1;
|
Shrink "struct object" a bit
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.
In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.
Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.
This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.
There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.
Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
19 years ago
|
|
|
unsigned type : TYPE_BITS;
|
|
|
|
unsigned flags : FLAG_BITS;
|
|
|
|
struct object_id oid;
|
|
|
|
};
|
|
|
|
|
|
|
|
extern const char *typename(unsigned int type);
|
|
|
|
extern int type_from_string_gently(const char *str, ssize_t, int gentle);
|
|
|
|
#define type_from_string(str) type_from_string_gently(str, -1, 0)
|
Shrink "struct object" a bit
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.
In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.
Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.
This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.
There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.
Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
19 years ago
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the current number of buckets in the object hashmap.
|
|
|
|
*/
|
|
|
|
extern unsigned int get_max_object_index(void);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the object from the specified bucket in the object hashmap.
|
|
|
|
*/
|
|
|
|
extern struct object *get_indexed_object(unsigned int);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This can be used to see if we have heard of the object before, but
|
|
|
|
* it can return "yes we have, and here is a half-initialised object"
|
|
|
|
* for an object that we haven't loaded/parsed yet.
|
|
|
|
*
|
|
|
|
* When parsing a commit to create an in-core commit object, its
|
|
|
|
* parents list holds commit objects that represent its parents, but
|
|
|
|
* they are expected to be lazily initialized and do not know what
|
|
|
|
* their trees or parents are yet. When this function returns such a
|
|
|
|
* half-initialised objects, the caller is expected to initialize them
|
|
|
|
* by calling parse_object() on them.
|
|
|
|
*/
|
|
|
|
struct object *lookup_object(const unsigned char *sha1);
|
|
|
|
|
|
|
|
extern void *create_object(const unsigned char *sha1, void *obj);
|
|
|
|
|
add object_as_type helper for casting objects
When we call lookup_commit, lookup_tree, etc, the logic goes
something like:
1. Look for an existing object struct. If we don't have
one, allocate and return a new one.
2. Double check that any object we have is the expected
type (and complain and return NULL otherwise).
3. Convert an object with type OBJ_NONE (from a prior
call to lookup_unknown_object) to the expected type.
We can encapsulate steps 2 and 3 in a helper function which
checks whether we have the expected object type, converts
OBJ_NONE as appropriate, and returns the object.
Not only does this shorten the code, but it also provides
one central location for converting OBJ_NONE objects into
objects of other types. Future patches will use that to
enforce type-specific invariants.
Since this is a refactoring, we would want it to behave
exactly as the current code. It takes a little reasoning to
see that this is the case:
- for lookup_{commit,tree,etc} functions, we are just
pulling steps 2 and 3 into a function that does the same
thing.
- for the call in peel_object, we currently only do step 3
(but we want to consolidate it with the others, as
mentioned above). However, step 2 is a noop here, as the
surrounding conditional makes sure we have OBJ_NONE
(which we want to keep to avoid an extraneous call to
sha1_object_info).
- for the call in lookup_commit_reference_gently, we are
currently doing step 2 but not step 3. However, step 3
is a noop here. The object we got will have just come
from deref_tag, which must have figured out the type for
each object in order to know when to stop peeling.
Therefore the type will never be OBJ_NONE.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
void *object_as_type(struct object *obj, enum object_type type, int quiet);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Returns the object, having parsed it to find out what it is.
|
|
|
|
*
|
|
|
|
* Returns NULL if the object is missing or corrupt.
|
|
|
|
*/
|
|
|
|
struct object *parse_object(const unsigned char *sha1);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Like parse_object, but will die() instead of returning NULL. If the
|
|
|
|
* "name" parameter is not NULL, it is included in the error message
|
|
|
|
* (otherwise, the sha1 hex is given).
|
|
|
|
*/
|
|
|
|
struct object *parse_object_or_die(const unsigned char *sha1, const char *name);
|
|
|
|
|
|
|
|
/* Given the result of read_sha1_file(), returns the object after
|
|
|
|
* parsing it. eaten_p indicates if the object has a borrowed copy
|
|
|
|
* of buffer and the caller should not free() it.
|
|
|
|
*/
|
|
|
|
struct object *parse_object_buffer(const unsigned char *sha1, enum object_type type, unsigned long size, void *buffer, int *eaten_p);
|
|
|
|
|
|
|
|
/** Returns the object, with potentially excess memory allocated. **/
|
|
|
|
struct object *lookup_unknown_object(const unsigned char *sha1);
|
|
|
|
|
|
|
|
struct object_list *object_list_insert(struct object *item,
|
|
|
|
struct object_list **list_p);
|
|
|
|
|
|
|
|
int object_list_contains(struct object_list *list, struct object *obj);
|
|
|
|
|
Add "named object array" concept
We've had this notion of a "object_list" for a long time, which eventually
grew a "name" member because some users (notably git-rev-list) wanted to
name each object as it is generated.
That object_list is great for some things, but it isn't all that wonderful
for others, and the "name" member is generally not used by everybody.
This patch splits the users of the object_list array up into two: the
traditional list users, who want the list-like format, and who don't
actually use or want the name. And another class of users that really used
the list as an extensible array, and generally wanted to name the objects.
The patch is fairly straightforward, but it's also biggish. Most of it
really just cleans things up: switching the revision parsing and listing
over to the array makes things like the builtin-diff usage much simpler
(we now see exactly how many members the array has, and we don't get the
objects reversed from the order they were on the command line).
One of the main reasons for doing this at all is that the malloc overhead
of the simple object list was actually pretty high, and the array is just
a lot denser. So this patch brings down memory usage by git-rev-list by
just under 3% (on top of all the other memory use optimizations) on the
mozilla archive.
It does add more lines than it removes, and more importantly, it adds a
whole new infrastructure for maintaining lists of objects, but on the
other hand, the new dynamic array code is pretty obvious. The change to
builtin-diff-tree.c shows a fairly good example of why an array interface
is sometimes more natural, and just much simpler for everybody.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
19 years ago
|
|
|
/* Object array handling .. */
|
|
|
|
void add_object_array(struct object *obj, const char *name, struct object_array *array);
|
|
|
|
void add_object_array_with_path(struct object *obj, const char *name, struct object_array *array, unsigned mode, const char *path);
|
|
|
|
|
|
|
|
typedef int (*object_array_each_func_t)(struct object_array_entry *, void *);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Apply want to each entry in array, retaining only the entries for
|
|
|
|
* which the function returns true. Preserve the order of the entries
|
|
|
|
* that are retained.
|
|
|
|
*/
|
|
|
|
void object_array_filter(struct object_array *array,
|
|
|
|
object_array_each_func_t want, void *cb_data);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Remove from array all but the first entry with a given name.
|
|
|
|
* Warning: this function uses an O(N^2) algorithm.
|
|
|
|
*/
|
|
|
|
void object_array_remove_duplicates(struct object_array *array);
|
Add "named object array" concept
We've had this notion of a "object_list" for a long time, which eventually
grew a "name" member because some users (notably git-rev-list) wanted to
name each object as it is generated.
That object_list is great for some things, but it isn't all that wonderful
for others, and the "name" member is generally not used by everybody.
This patch splits the users of the object_list array up into two: the
traditional list users, who want the list-like format, and who don't
actually use or want the name. And another class of users that really used
the list as an extensible array, and generally wanted to name the objects.
The patch is fairly straightforward, but it's also biggish. Most of it
really just cleans things up: switching the revision parsing and listing
over to the array makes things like the builtin-diff usage much simpler
(we now see exactly how many members the array has, and we don't get the
objects reversed from the order they were on the command line).
One of the main reasons for doing this at all is that the malloc overhead
of the simple object list was actually pretty high, and the array is just
a lot denser. So this patch brings down memory usage by git-rev-list by
just under 3% (on top of all the other memory use optimizations) on the
mozilla archive.
It does add more lines than it removes, and more importantly, it adds a
whole new infrastructure for maintaining lists of objects, but on the
other hand, the new dynamic array code is pretty obvious. The change to
builtin-diff-tree.c shows a fairly good example of why an array interface
is sometimes more natural, and just much simpler for everybody.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
19 years ago
|
|
|
|
|
|
|
/*
|
|
|
|
* Remove any objects from the array, freeing all used memory; afterwards
|
|
|
|
* the array is ready to store more objects with add_object_array().
|
|
|
|
*/
|
|
|
|
void object_array_clear(struct object_array *array);
|
|
|
|
|
|
|
|
void clear_object_flags(unsigned flags);
|
|
|
|
|
|
|
|
#endif /* OBJECT_H */
|