(updated from the version posted to GIT mailing list).
When a new blob is registered with update-cache, and before the cache
is written as a tree and committed, git-fsck-cache will find the blob
unreachable. This patch adds a new flag, "--cache" to git-fsck-cache,
with which it keeps such blobs from considered "unreachable".
The git-prune-script is updated to use this new flag. At the same time
it adds .git/refs/*/* to the set of default locations to look for heads,
which should be consistent with expectations from Cogito users.
Without this fix, "diff-cache -p --cached" after git-prune-script has
pruned the blob object will fail mysteriously and git-write-tree would
also fail.
Signed-off-by: Junio C Hamano <junkio@cox.net>
fsck_tag() failes to notice that the parsing of the tag may
have failed in the parse_object() call on the object that it
is tagging.
Noticed by Junio.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We check the ordering of the entries, and we verify that none
of the entries has a slash in it (this allows us to remove the
hacky "has_full_path" member from the tree structure, since we
now just test it by walking the tree entries instead).
This improves the cold-cache behaviour on most filesystems,
since it makes the fsck access patterns more regular on
the disk, rather than seeking back and forth.
Note the "most". Not all filesystems have any relationship
between inode number and location on disk.
This allows the programs to use various simplified versions of
the SHA1 names, eg just say "HEAD" for the SHA1 pointed to by
the .git/HEAD file etc.
For example, this commit has been done with
git-commit-tree $(git-write-tree) -p HEAD
instead of the traditional "$(cat .git/HEAD)" syntax.
Show the types of objects involved in broken links, and don't bother
warning about unreachable tag files (if somebody cares about tags,
they'll use the --tags flag to see them).
With support for parse_object() and tags in the core, fsck_cache can just
call them, and can be simplified a bit.
Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We should _not_ mark a blob object "parsed" just because we
looked it up: it gets marked that way only once we've actually
seen it. Otherwise we can never notice a missing blob.
This ports fsck-cache to use parsing functions. Note that performance
could be improved here by only reading each object once, but this requires
somewhat more complicated flow control.
Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- mark_reachable() can be more generic, marking the reachable revisions
with an arbitrary mask.
- date parsing will parse to a date of 0 rather than ULONG_MAX for the
bad old case, sorting the dates correctly.
It's really a very generic thing: the notion of one sha1 revision
referring to another one. "fsck" uses it for all nodes, and "rev-tree"
only tracks commit-node relationships, but the code was already
the same - now we just make that explicit by moving it to a common
header file.
command line.
"arbitrary" is a bit wrong, since it is limited by the argument
size limit (128kB or so), but let's see if anybody ever cares.
Arguably you should prune your tree before you have a few thousand
dangling heads in your archive.
We can fix it by passing in a file listing if we ever care.
This makes things a lot more efficient, and makes it trivial to do things
like reachability analysis.
Add command line flags to tell what the head is, and whether to warn
about unreachable objects.
Now there is error() for "library" errors and die() for fatal "application"
errors. usage() is now used strictly only for usage errors.
Signed-off-by: Petr Baudis <pasky@ucw.cz>
properly clear the reference count at init time. It happened to work
for me by pure luck.
Until it broke, and my unreferenced commit suddenly looked referenced
again. Fixed.
Which made fsck very quiet about objects it hadn't found. So add
it.
We'll need to make things like these optional, because it's
perfectly ok to have partial history if you don't want it,
and don't want to go backwards. But for development, it's best
to always complain about missing sha1 object files that are
referenced from somewhere else.
This shows that I've lost track of one commit already. Most likely
because I forgot to update the .dircache/HEAD file when doing a
commit, so that the next commit referenced not the top-of-tree, but
the one older commit.
Having dangling commits is fine (in fact, you should always have
at least _one_ dangling commit in the top-of-tree). But it's
good to know about them.
This is totally untested, since we can't actually _write_ things that
way yet, but I'll get to that next, I hope. That should fix the
huge wasted space for kernel-sized tree objects.
Patches from Dave Jones and Ingo Molnar, but since I don't have any
infrastructure in place to use the old patch applicator scripts I
am trying to build up, I ended up fixing the thing by hand instead.
Credit where credit is due, though. Nice to see that people are
taking a look at the project even in this early stage.