git/blob.c

#include "cache.h"
#include "blob.h"
#include <stdlib.h>

const char *blob_type = "blob";

struct blob *lookup_blob(const unsigned char *sha1)
{
	struct object *obj = lookup_object(sha1);
	if (!obj) {
		struct blob *ret = alloc_blob_node();
		created_object(sha1, &ret->object);
		ret->object.type = OBJ_BLOB;
		return ret;
	}
	if (!obj->type)
		obj->type = OBJ_BLOB;
	if (obj->type != OBJ_BLOB) {
		error("Object %s is a %s, not a blob",
		      sha1_to_hex(sha1), typename(obj->type));
		return NULL;
	}
	return (struct blob *) obj;
}

int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
{
	item->object.parsed = 1;
	return 0;
}

int parse_blob(struct blob *item)
{
        char type[20];
        void *buffer;
        unsigned long size;
	int ret;

        if (item->object.parsed)
                return 0;
        buffer = read_sha1_file(item->object.sha1, type, &size);
        if (!buffer)
                return error("Could not read %s",
                             sha1_to_hex(item->object.sha1));
        if (strcmp(type, blob_type))
                return error("Object %s not a blob",
                             sha1_to_hex(item->object.sha1));
	ret = parse_blob_buffer(item, buffer, size);
	free(buffer);
	return ret;
}
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`#include "cache.h"`
[PATCH] Compilation: zero-length array declaration. ISO C99 (and GCC 3.x or later) lets you write a flexible array at the end of a structure, like this: struct frotz { int xyzzy; char nitfol[]; /* more / }; GCC 2.95 and 2.96 let you to do this with "char nitfol[0]"; unfortunately this is not allowed by ISO C90. This declares such construct like this: struct frotz { int xyzzy; char nitfol[FLEX_ARRAY]; / more */ }; and git-compat-util.h defines FLEX_ARRAY to 0 for gcc 2.95 and empty for others. If you are using a C90 C compiler, you should be able to override this with CFLAGS=-DFLEX_ARRAY=1 from the command line of "make". Signed-off-by: Junio C Hamano <junkio@cox.net> 19 years ago			`#include "blob.h"`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`#include <stdlib.h>`

			`const char *blob_type = "blob";`

[PATCH] Anal retentive 'const unsigned char *sha1' Make 'sha1' parameters const where possible Signed-off-by: Jason McMullan <jason.mcmullan@timesys.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`struct blob lookup_blob(const unsigned char sha1)`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`{`
			`struct object *obj = lookup_object(sha1);`
			`if (!obj) {`
Add specialized object allocator This creates a simple specialized object allocator for basic objects. This avoids wasting space with malloc overhead (metadata and extra alignment), since the specialized allocator knows the alignment, and that objects, once allocated, are never freed. It also allows us to track some basic statistics about object allocations. For example, for the mozilla import, it shows object usage as follows: blobs: 627629 (14710 kB) trees: 1119035 (34969 kB) commits: 196423 (8440 kB) tags: 1336 (46 kB) and the simpler allocator shaves off about 2.5% off the memory footprint off a "git-rev-list --all --objects", and is a bit faster too. [ Side note: this concludes the series of "save memory in object storage". The thing is, there simply isn't much more to be saved on the objects. Doing "git-rev-list --all --objects" on the mozilla archive has a final total RSS of 131498 pages for me: that's about 513MB. Of that, the object overhead is now just 56MB, the rest is going somewhere else (put another way: the fact that this patch shaves off 2.5% of the total memory overhead, considering that objects are now not much more than 10% of the total shows how big the wasted space really was: this makes object allocations much more memory- and time-efficient). I haven't looked at where the rest is, but I suspect the bulk of it is just the pack-file loading. It may be that we should pack the tree objects separately from the blob objects: for git-rev-list --objects, we don't actually ever need to even look at the blobs, but since trees and blobs are interspersed in the pack-file, we end up not being dense in the tree accesses, so we end up looking at more pages than we strictly need to. So with a 535MB pack-file, it's entirely possible - even likely - that most of the remaining RSS is just the mmap of the pack-file itself. We don't need to map in _all_ of it, but we do end up mapping a fair amount. ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 19 years ago			`struct blob *ret = alloc_blob_node();`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`created_object(sha1, &ret->object);`
Remove TYPE_* constant macros and use object_type enums consistently. This updates the type-enumeration constants introduced to reduce the memory footprint of "struct object" to match the type bits already used in the packfile format, by removing the former (i.e. TYPE_* constant macros) and using the latter (i.e. enum object_type) throughout the code for consistency. Eventually we can stop passing around the "type strings" entirely, and this will help - no confusion about two different integer enumeration. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 19 years ago			`ret->object.type = OBJ_BLOB;`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`return ret;`
			`}`
[PATCH] delta check This adds knowledge of delta objects to fsck-cache and various object parsing code. A new switch to git-fsck-cache is provided to display the maximum delta depth found in a repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`if (!obj->type)`
Remove TYPE_* constant macros and use object_type enums consistently. This updates the type-enumeration constants introduced to reduce the memory footprint of "struct object" to match the type bits already used in the packfile format, by removing the former (i.e. TYPE_* constant macros) and using the latter (i.e. enum object_type) throughout the code for consistency. Eventually we can stop passing around the "type strings" entirely, and this will help - no confusion about two different integer enumeration. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 19 years ago			`obj->type = OBJ_BLOB;`
			`if (obj->type != OBJ_BLOB) {`
Shrink "struct object" a bit This shrinks "struct object" by a small amount, by getting rid of the "struct type *" pointer and replacing it with a 3-bit bitfield instead. In addition, we merge the bitfields and the "flags" field, which incidentally should also remove a useless 4-byte padding from the object when in 64-bit mode. Now, our "struct object" is still too damn large, but it's now less obviously bloated, and of the remaining fields, only the "util" (which is not used by most things) is clearly something that should be eventually discarded. This shrinks the "git-rev-list --all" memory use by about 2.5% on the kernel archive (and, perhaps more importantly, on the larger mozilla archive). That may not sound like much, but I suspect it's more on a 64-bit platform. There are other remaining inefficiencies (the parent lists, for example, probably have horrible malloc overhead), but this was pretty obvious. Most of the patch is just changing the comparison of the "type" pointer from one of the constant string pointers to the appropriate new TYPE_xxx small integer constant. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 19 years ago			`error("Object %s is a %s, not a blob",`
			`sha1_to_hex(sha1), typename(obj->type));`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`return NULL;`
			`}`
			`return (struct blob *) obj;`
			`}`
[PATCH] Mark blobs as parsed when they're actually parsed This eliminates the special case for blobs versus other types of objects. Now the scheme is entirely regular and I won't introduce stupid bugs. (And fsck-cache doesn't have to do the do-nothing parse) Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago
[PATCH] don't load and decompress objects twice with parse_object() It turns out that parse_object() is loading and decompressing given object to free it just before calling the specific object parsing function which does mmap and decompress the same object again. This patch introduces the ability to parse specific objects directly from a memory buffer. Without this patch, running git-fsck-cache on the kernel repositorytake: real 0m13.006s user 0m11.421s sys 0m1.218s With this patch applied: real 0m8.060s user 0m7.071s sys 0m0.710s The performance increase is significant, and this is kind of a prerequisite for sane delta object support with fsck. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`int parse_blob_buffer(struct blob item, void buffer, unsigned long size)`
			`{`
			`item->object.parsed = 1;`
			`return 0;`
			`}`

[PATCH] Mark blobs as parsed when they're actually parsed This eliminates the special case for blobs versus other types of objects. Now the scheme is entirely regular and I won't introduce stupid bugs. (And fsck-cache doesn't have to do the do-nothing parse) Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`int parse_blob(struct blob *item)`
			`{`
			`char type[20];`
			`void *buffer;`
			`unsigned long size;`
[PATCH] don't load and decompress objects twice with parse_object() It turns out that parse_object() is loading and decompressing given object to free it just before calling the specific object parsing function which does mmap and decompress the same object again. This patch introduces the ability to parse specific objects directly from a memory buffer. Without this patch, running git-fsck-cache on the kernel repositorytake: real 0m13.006s user 0m11.421s sys 0m1.218s With this patch applied: real 0m8.060s user 0m7.071s sys 0m0.710s The performance increase is significant, and this is kind of a prerequisite for sane delta object support with fsck. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`int ret;`

[PATCH] Mark blobs as parsed when they're actually parsed This eliminates the special case for blobs versus other types of objects. Now the scheme is entirely regular and I won't introduce stupid bugs. (And fsck-cache doesn't have to do the do-nothing parse) Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`if (item->object.parsed)`
			`return 0;`
			`buffer = read_sha1_file(item->object.sha1, type, &size);`
			`if (!buffer)`
			`return error("Could not read %s",`
			`sha1_to_hex(item->object.sha1));`
			`if (strcmp(type, blob_type))`
			`return error("Object %s not a blob",`
			`sha1_to_hex(item->object.sha1));`
[PATCH] don't load and decompress objects twice with parse_object() It turns out that parse_object() is loading and decompressing given object to free it just before calling the specific object parsing function which does mmap and decompress the same object again. This patch introduces the ability to parse specific objects directly from a memory buffer. Without this patch, running git-fsck-cache on the kernel repositorytake: real 0m13.006s user 0m11.421s sys 0m1.218s With this patch applied: real 0m8.060s user 0m7.071s sys 0m0.710s The performance increase is significant, and this is kind of a prerequisite for sane delta object support with fsck. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`ret = parse_blob_buffer(item, buffer, size);`
			`free(buffer);`
			`return ret;`
[PATCH] Mark blobs as parsed when they're actually parsed This eliminates the special case for blobs versus other types of objects. Now the scheme is entirely regular and I won't introduce stupid bugs. (And fsck-cache doesn't have to do the do-nothing parse) Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 20 years ago			`}`