|
|
|
Date: Fri, 9 Nov 2007 08:28:38 -0800 (PST)
|
|
|
|
From: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
Subject: corrupt object on git-gc
|
|
|
|
Abstract: Some tricks to reconstruct blob objects in order to fix
|
|
|
|
a corrupted repository.
|
|
|
|
Content-type: text/asciidoc
|
|
|
|
|
|
|
|
How to recover a corrupted blob object
|
|
|
|
======================================
|
|
|
|
|
|
|
|
-----------------------------------------------------------
|
|
|
|
On Fri, 9 Nov 2007, Yossi Leybovich wrote:
|
|
|
|
>
|
|
|
|
> Did not help still the repository look for this object?
|
|
|
|
> Any one know how can I track this object and understand which file is it
|
|
|
|
-----------------------------------------------------------
|
|
|
|
|
|
|
|
So exactly *because* the SHA-1 hash is cryptographically secure, the hash
|
|
|
|
itself doesn't actually tell you anything, in order to fix a corrupt
|
|
|
|
object you basically have to find the "original source" for it.
|
|
|
|
|
|
|
|
The easiest way to do that is almost always to have backups, and find the
|
|
|
|
same object somewhere else. Backups really are a good idea, and Git makes
|
|
|
|
it pretty easy (if nothing else, just clone the repository somewhere else,
|
|
|
|
and make sure that you do *not* use a hard-linked clone, and preferably
|
|
|
|
not the same disk/machine).
|
|
|
|
|
|
|
|
But since you don't seem to have backups right now, the good news is that
|
|
|
|
especially with a single blob being corrupt, these things *are* somewhat
|
|
|
|
debuggable.
|
|
|
|
|
|
|
|
First off, move the corrupt object away, and *save* it. The most common
|
|
|
|
cause of corruption so far has been memory corruption, but even so, there
|
|
|
|
are people who would be interested in seeing the corruption - but it's
|
|
|
|
basically impossible to judge the corruption until we can also see the
|
|
|
|
original object, so right now the corrupt object is useless, but it's very
|
|
|
|
interesting for the future, in the hope that you can re-create a
|
|
|
|
non-corrupt version.
|
|
|
|
|
|
|
|
-----------------------------------------------------------
|
|
|
|
So:
|
|
|
|
|
|
|
|
> ib]$ mv .git/objects/4b/9458b3786228369c63936db65827de3cc06200 ../
|
|
|
|
-----------------------------------------------------------
|
|
|
|
|
|
|
|
This is the right thing to do, although it's usually best to save it under
|
|
|
|
it's full SHA-1 name (you just dropped the "4b" from the result ;).
|
|
|
|
|
|
|
|
Let's see what that tells us:
|
|
|
|
|
|
|
|
-----------------------------------------------------------
|
|
|
|
> ib]$ git-fsck --full
|
|
|
|
> broken link from tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8
|
|
|
|
> to blob 4b9458b3786228369c63936db65827de3cc06200
|
|
|
|
> missing blob 4b9458b3786228369c63936db65827de3cc06200
|
|
|
|
-----------------------------------------------------------
|
|
|
|
|
|
|
|
Ok, I removed the "dangling commit" messages, because they are just
|
|
|
|
messages about the fact that you probably have rebased etc, so they're not
|
|
|
|
at all interesting. But what remains is still very useful. In particular,
|
|
|
|
we now know which tree points to it!
|
|
|
|
|
|
|
|
Now you can do
|
|
|
|
|
|
|
|
git ls-tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8
|
|
|
|
|
|
|
|
which will show something like
|
|
|
|
|
|
|
|
100644 blob 8d14531846b95bfa3564b58ccfb7913a034323b8 .gitignore
|
|
|
|
100644 blob ebf9bf84da0aab5ed944264a5db2a65fe3a3e883 .mailmap
|
|
|
|
100644 blob ca442d313d86dc67e0a2e5d584b465bd382cbf5c COPYING
|
|
|
|
100644 blob ee909f2cc49e54f0799a4739d24c4cb9151ae453 CREDITS
|
|
|
|
040000 tree 0f5f709c17ad89e72bdbbef6ea221c69807009f6 Documentation
|
|
|
|
100644 blob 1570d248ad9237e4fa6e4d079336b9da62d9ba32 Kbuild
|
|
|
|
100644 blob 1c7c229a092665b11cd46a25dbd40feeb31661d9 MAINTAINERS
|
|
|
|
...
|
|
|
|
|
|
|
|
and you should now have a line that looks like
|
|
|
|
|
|
|
|
10064 blob 4b9458b3786228369c63936db65827de3cc06200 my-magic-file
|
|
|
|
|
|
|
|
in the output. This already tells you a *lot* it tells you what file the
|
|
|
|
corrupt blob came from!
|
|
|
|
|
|
|
|
Now, it doesn't tell you quite enough, though: it doesn't tell what
|
|
|
|
*version* of the file didn't get correctly written! You might be really
|
|
|
|
lucky, and it may be the version that you already have checked out in your
|
|
|
|
working tree, in which case fixing this problem is really simple, just do
|
|
|
|
|
|
|
|
git hash-object -w my-magic-file
|
|
|
|
|
|
|
|
again, and if it outputs the missing SHA-1 (4b945..) you're now all done!
|
|
|
|
|
|
|
|
But that's the really lucky case, so let's assume that it was some older
|
|
|
|
version that was broken. How do you tell which version it was?
|
|
|
|
|
|
|
|
The easiest way to do it is to do
|
|
|
|
|
|
|
|
git log --raw --all --full-history -- subdirectory/my-magic-file
|
|
|
|
|
|
|
|
and that will show you the whole log for that file (please realize that
|
|
|
|
the tree you had may not be the top-level tree, so you need to figure out
|
|
|
|
which subdirectory it was in on your own), and because you're asking for
|
|
|
|
raw output, you'll now get something like
|
|
|
|
|
|
|
|
commit abc
|
|
|
|
Author:
|
|
|
|
Date:
|
|
|
|
..
|
|
|
|
:100644 100644 4b9458b... newsha... M somedirectory/my-magic-file
|
|
|
|
|
|
|
|
|
|
|
|
commit xyz
|
|
|
|
Author:
|
|
|
|
Date:
|
|
|
|
|
|
|
|
..
|
|
|
|
:100644 100644 oldsha... 4b9458b... M somedirectory/my-magic-file
|
|
|
|
|
|
|
|
and this actually tells you what the *previous* and *subsequent* versions
|
|
|
|
of that file were! So now you can look at those ("oldsha" and "newsha"
|
|
|
|
respectively), and hopefully you have done commits often, and can
|
|
|
|
re-create the missing my-magic-file version by looking at those older and
|
|
|
|
newer versions!
|
|
|
|
|
|
|
|
If you can do that, you can now recreate the missing object with
|
|
|
|
|
|
|
|
git hash-object -w <recreated-file>
|
|
|
|
|
|
|
|
and your repository is good again!
|
|
|
|
|
|
|
|
(Btw, you could have ignored the fsck, and started with doing a
|
|
|
|
|
|
|
|
git log --raw --all
|
|
|
|
|
|
|
|
and just looked for the sha of the missing object (4b9458b..) in that
|
|
|
|
whole thing. It's up to you - Git does *have* a lot of information, it is
|
|
|
|
just missing one particular blob version.
|
|
|
|
|
|
|
|
Trying to recreate trees and especially commits is *much* harder. So you
|
|
|
|
were lucky that it's a blob. It's quite possible that you can recreate the
|
|
|
|
thing.
|
|
|
|
|
|
|
|
Linus
|