kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	4cc8d6c62d	add -u: do not fail to resolve a path as deleted After you resolve a conflicted merge to remove the path, "git add -u" failed to record the removal. Instead it errored out by saying that the removed path is not found in the work tree, but that is what the user already knows, and the wanted to record the removal as the resolution, so the error does not make sense. Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Linus Torvalds	a60272b38e	Make 'ce_compare_link()' use the new 'strbuf_readlink()' This simplifies the code, and also makes ce_compare_link now able to handle filesystems with odd 'st_size' return values for symlinks. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Junio C Hamano	63e8dc5b14	read-cache.c: typofix in comment Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Junio C Hamano	331fcb598e	git add --intent-to-add: do not let an empty blob be committed by accident Writing a tree out of an index with an "intent to add" entry is blocked. This implies that you cannot "git commit" from such a state; however you can still do "git commit -a" or "git commit $that_path". Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Junio C Hamano	388b2acd6e	git add --intent-to-add: fix removal of cached emptiness This uses the extended index flag mechanism introduced earlier to mark the entries added to the index via "git add -N" with CE_INTENT_TO_ADD. The logic to detect an "intent to add" entry for the purpose of allowing "git rm --cached $path" is tightened to check not just for a staged empty blob, but with the CE_INTENT_TO_ADD bit. This protects an empty blob that was explicitly added and then modified in the work tree from being dropped with this sequence: $ >empty $ git add empty $ echo "non empty" >empty $ git rm --cached empty Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Junio C Hamano	fa7b3c2f75	checkout: Fix "initial checkout" detection Earlier commit `5521883` (checkout: do not lose staged removal, 2008-09-07) tightened the rule to prevent switching branches from losing local changes, so that staged removal of paths can be protected, while attempting to keep a loophole to still allow a special case of switching out of an un-checked-out state. However, the loophole was made a bit too tight, and did not allow switching from one branch (in an un-checked-out state) to check out another branch. The change to builtin-checkout.c in this commit loosens it to allow this, by not insisting the original commit and the new commit to be the same. It also introduces a new function, is_index_unborn (and an associated macro, is_cache_unborn), to check if the repository is truly in an un-checked-out state more reliably, by making sure that $GIT_INDEX_FILE did not exist when populating the in-core index structure. A few places the earlier commit `5521883` added the check for the initial checkout condition are updated to use this function. Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Jeff King	f55527f802	rm: loosen safety valve for empty files If a file is different between the working tree copy, the index, and the HEAD, then we do not allow it to be deleted without --force. However, this is overly tight in the face of "git add --intent-to-add": $ git add --intent-to-add file $ : oops, I don't actually want to stage that yet $ git rm --cached file error: 'empty' has staged content different from both the file and the HEAD (use -f to force removal) $ git rm -f --cached file Unfortunately, there is currently no way to distinguish between an empty file that has been added and an "intent to add" file. The ideal behavior would be to disallow the former while allowing the latter. This patch loosens the safety valve to allow the deletion only if we are deleting the cached entry and the cached content is empty. This covers the intent-to-add situation, and assumes there is little harm in not protecting users who have legitimately added an empty file. In many cases, the file will still be empty, in which case the safety valve does not trigger anyway (since the content remains untouched in the working tree). Otherwise, we do remove the fact that no content was staged, but given that the content is by definition empty, it is not terribly difficult for a user to recreate it. However, we still document the desired behavior in the form of two tests. One checks the correct removal of an intent-to-add file. The other checks that we still disallow removal of empty files, but is marked as expect_failure to indicate this compromise. If the intent-to-add feature is ever extended to differentiate between normal empty files and intent-to-add files, then the safety valve can be re-tightened. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Junio C Hamano	d1a43f2aa4	reset --hard/read-tree --reset -u: remove unmerged new paths When aborting a failed merge that has brought in a new path using "git reset --hard" or "git read-tree --reset -u", we used to first forget about the new path (via read_cache_unmerged) and then matched the working tree to what is recorded in the index, thus ending up leaving the new path in the work tree. Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Jeff King	98fa473887	refactor handling of "other" files in ls-files and status When the "git status" display code was originally converted to C, we copied the code from ls-files to discover whether a pathname returned by read_directory was an "other", or untracked, file. Much later, `5698454e` updated the code in ls-files to handle some new cases caused by gitlinks. This left the code in wt-status.c broken: it would display submodule directories as untracked directories. Nobody noticed until now, however, because unless status.showUntrackedFiles was set to "all", submodule directories were not actually reported by read_directory. So the bug was only triggered in the presence of a submodule _and_ this config option. This patch pulls the ls-files code into a new function, cache_name_is_other, and uses it in both places. This should leave the ls-files functionality the same and fix the bug in status. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Nguyễn Thái Ngọc Duy	06aaaa0bf7	Extend index to save more flags The on-disk format of index only saves 16 bit flags, nearly all have been used. The last bit (CE_EXTENDED) is used to for future extension. This patch extends index entry format to save more flags in future. The new entry format will be used when CE_EXTENDED bit is 1. Because older implementation may not understand CE_EXTENDED bit and misread the new format, if there is any extended entry in index, index header version will turn 3, which makes it incompatible for older git. If there is none, header version will return to 2 again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	16 years ago
Brandon Casey	f285a2d7ed	Replace calls to strbuf_init(&foo, 0) with STRBUF_INIT initializer Many call sites use strbuf_init(&foo, 0) to initialize local strbuf variable "foo" which has not been accessed since its declaration. These can be replaced with a static initialization using the STRBUF_INIT macro which is just as readable, saves a function call, and takes up fewer lines. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	16 years ago
Dmitry Potapov	7e7abea96b	print an error message for invalid path If verification of path failed, it is always better to print an error message saying this than relying on the caller function to print a meaningful error message (especially when the callee already prints error message for another situation). Because the callers of add_index_entry_with_check() did not print any error message, it resulted that the user would not notice the problem when checkout of an invalid path failed. Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	16 years ago
Nicolas Pitre	9126f0091f	fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit `684ec6c63c` whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	16 years ago
Junio C Hamano	394258190c	git-add --intent-to-add (-N) This adds "--intent-to-add" option to "git add". This is to let the system know that you will tell it the final contents to be staged later, iow, just be aware of the presense of the path with the type of the blob for now. It is implemented by staging an empty blob as the content. With this sequence: $ git reset --hard $ edit newfile $ git add -N newfile $ edit newfile oldfile $ git diff the diff will show all changes relative to the current commit. Then you can do: $ git commit -a ;# commit everything or $ git commit oldfile ;# only oldfile, newfile not yet added to pretend you are working with an index-free system like CVS. Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	913e0e99b6	unpack_trees(): protect the handcrafted in-core index from read_cache() unpack_trees() rebuilds the in-core index from scratch by allocating a new structure and finishing it off by copying the built one to the final index. The resulting in-core index is Ok for most use, but read_cache() does not recognize it as such. The function is meant to be no-op if you already have loaded the index, until you call discard_cache(). This change the way read_cache() detects an already initialized in-core index, by introducing an extra bit, and marks the handcrafted in-core index as initialized, to avoid this problem. A better fix in the longer term would be to change the read_cache() API so that it will always discard and re-read from the on-disk index to avoid confusion. But there are higher level API that have relied on the current semantics, and they and their users all need to get converted, which is outside the scope of 'maint' track. An example of such a higher level API is write_cache_as_tree(), which is used by git-write-tree as well as later Porcelains like git-merge, revert and cherry-pick. In the longer term, we should remove read_cache() from there and add one to cmd_write_tree(); other callers expect that the in-core index they prepared is what gets written as a tree so no other change is necessary for this particular codepath. The original version of this patch marked the index by pointing an otherwise wasted malloc'ed memory with o->result.alloc, but this version uses Linus's idea to use a new "initialized" bit, which is conceptually much cleaner. Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	64ca23afda	discard_cache: reset lazy name_hash bit We forgot to reset name_hash_initialized bit when discarding the in-core index. Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	16ce2e4c8f	index: future proof for "extended" index entries We do not have any more bits in the on-disk index flags word, but we would need to have more in the future. Use the last remaining bits as a signal to tell us that the index entry we are looking at is an extended one. Since we do not understand the extended format yet, we will just error out when we see it. Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	c70115b4b1	Teach gitlinks to ie_modified() and ce_modified_check_fs() The ie_modified() function is the workhorse for refresh_cache_entry(), i.e. checking if an index entry that is stat-dirty actually has changes. After running quicker check to compare cached stat information with results from the latest lstat(2) to answer "has modification" early, the code goes on to check if there really is a change by comparing the staged data with what is on the filesystem by asking ce_modified_check_fs(). However, this function always said "no change" for any gitlinks that has a directory at the corresponding path. This made ie_modified() to miss actual changes in the subproject. The patch fixes this first by modifying an existing short-circuit logic before calling the ce_modified_check_fs() function. It knows that for any filesystem entity to which ie_match_stat() says its data has changed, if its cached size is nonzero then the contents cannot match, which is a correct optimization only for blob objects. We teach gitlink objects to this special case, as we already know that any gitlink that ie_match_stat() says is modified is indeed modified at this point in the codepath. With the above change, we could leave ce_modified_check_fs() broken, but it also futureproofs the code by teaching it to use ce_compare_gitlink(), instead of assuming (incorrectly) that any directory is unchanged. Originally noticed by Alex Riesen on Cygwin. Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Alex Riesen	1ce4790bf5	Make use of stat.ctime configurable A new configuration variable 'core.trustctime' is introduced to allow ignoring st_ctime information when checking if paths in the working tree has changed, because there are situations where it produces too much false positives. Like when file system crawlers keep changing it when scanning and using the ctime for marking scanned files. The default is to notice ctime changes. Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Petr Baudis	81dc2307d0	git-mv: Keep moved index entries inact The rewrite of git-mv from a shell script to a builtin was perhaps a little too straightforward: the git add and git rm queues were emulated directly, which resulted in a rather complicated code and caused an inconsistent behaviour when moving dirty index entries; git mv would update the entry based on working tree state, except in case of overwrites, where the new entry would still have sha1 of the old file. This patch introduces rename_index_entry_at() into the index toolkit, which will rename an entry while removing any entries the new entry might render duplicate. This is then used in git mv instead of all the file queues, resulting in a major simplification of the code and an inevitable change in git mv -n output format. Also the code used to refuse renaming overwriting symlink with a regular file and vice versa; there is no need for that. A few new tests have been added to the testsuite to reflect this change. Signed-off-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	041aee31be	builtin-add.c: restructure the code for maintainability A private function add_files_to_cache() in builtin-add.c was borrowed by checkout and commit re-implementors without getting properly refactored to more library-ish place. This does the refactoring. Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	d14e7407b3	"needs update" considered harmful "git update-index --refresh", "git reset" and "git add --refresh" have reported paths that have local modifications as "needs update" since the beginning of git. Although this is logically correct in that you need to update the index at that path before you can commit that change, it is now becoming more and more clear, especially with the continuous push for user friendliness since 1.5.0 series, that the message is suboptimal. After all, the change may be something the user might want to get rid of, and "updating" would be absolutely a wrong thing to do if that is the case. I prepared two alternatives to solve this. Both aim to reword the message to more neutral "locally modified". This patch is a more intrusive variant that changes the message for only Porcelain commands ("add" and "reset") while keeping the plumbing "update-index" intact. Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	3bf0dd1f4e	read-cache.c: typofix Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Miklos Vajna	e46bbcf6e8	Move read_cache_unmerged() to read-cache.c builtin-read-tree has a read_cache_unmerged() which is useful for other builtins, for example builtin-merge uses it as well. Move it to read-cache.c to avoid code duplication. Signed-off-by: Miklos Vajna <vmiklos@frugalware.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Linus Torvalds	f49c2c22fe	racy-git: an empty blob has a fixed object name We use size=0 as the magic token to say the entry is known to be racily clean, but a sequence that does: - update the path with a non-empty blob and write the index; - update an unrelated path and write the index -- this smudges the above entry; - truncate the path to size zero. would make both the size field for the path in the index and the size on the filesystem zero. We should not mistake it as a clean index entry. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Marius Storm-Olsen	aa9349d449	Add shortcut in refresh_cache_ent() for marked entries. When a cache entry has been marked as CE_VALID, the user has promised us that any change in the work tree does not matter. Just mark the entry as up-to-date, and continue. Signed-off-by: Marius Storm-Olsen <marius@trolltech.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	38ed1d89f7	"git-add -n -u" should not add but just report Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Johannes Schindelin	5fdeacb0ca	Teach update-index about --ignore-submodules Like with the diff machinery, update-index should sometimes just ignore submodules (e.g. to determine a clean state before a rebase). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Alex Riesen	960b8ad1b1	Make the exit code of add_file_to_index actually useful Update the programs which used the function (as add_file_to_cache). Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Linus Torvalds	d177cab048	Avoid some unnecessary lstat() calls The commit sequence used to do if (file_exists(p->path)) add_file_to_cache(p->path, 0); where both "file_exists()" and "add_file_to_cache()" needed to do a lstat() on the path to do their work. This cuts down 'lstat()' calls for the partial commit case by two for each path we know about (because we do this twice per path). Just move the lstat() to the caller instead (that's all that "file_exists()" really does), and pass the stat information down to the add_to_cache() function. This essentially makes 'add_to_index()' the core function that adds a path to the index, getting the index pointer, the pathname and the stat information as arguments. There are then shorthand helper functions that use this core function: - 'add_to_cache()' is just 'add_to_index()' with the default index - 'add_file_to_cache/index()' is the same, but does the lstat() call itself, so you can pass just the pathname if you don't already have the stat information available. So old users of the 'add_file_to_xyzzy()' are essentially left unchanged, and this just exposes the more generic helper function that can take existing stat information into account. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	050288d52d	is_racy_timestamp(): do not check timestamp for gitlinks Because we do not even check the timestamp to determie if a gitlink is up to date or not, triggering the racy-timestamp check for gitlinks does not make sense. This fixes the recently added test in t7506. Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	e06c43c795	write_index(): optimize ce_smudge_racily_clean_entry() calls with CE_UPTODATE When writing the index out, we need to check the work tree again to see if an entry whose timestamp indicates that it could be "racily clean", in order to smudge it if it is stat-clean but with modified contents. However, we can skip this step for entries marked with CE_UPTODATE, which are known to be the really clean (i.e. the one we already have checked when we prepared the index). This will reduce lstat(2) calls necessary in git-status. Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Linus Torvalds	1102952b45	Make git-add behave more sensibly in a case-insensitive environment This expands on the previous patch, and allows "git add" to sanely handle a filename that has changed case, keeping the case in the index constant, and avoiding aliases. In particular, if you have an index entry called "File", but the checked-out tree is case-corrupted and has an entry called "file" instead, doing a git add . (or naming "file" explicitly) will automatically notice that we have an alias, and will replace the name "file" with the existing index capitalization (ie "File"). However, if we actually have both a file called "File" and one called "file", and they don't have the same lstat() information (ie we're on a case-sensitive filesystem but have the "core.ignorecase" flag set), we will error out if we try to add them both. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Linus Torvalds	6835550def	When adding files to the index, add support for case-independent matches This simplifies the matching case of "I already have this file and it is up-to-date" and makes it do the right thing in the face of case-insensitive aliases. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Linus Torvalds	96872bc200	Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Linus Torvalds	d1f128b050	Add 'const' where appropriate to index handling functions This is in an effort to make the source index of 'unpack_trees()' as being const, and thus making the compiler help us verify that we only access it for reading. The constification also extended to some of the hashing helpers that get called indirectly. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Linus Torvalds	0ab9e1e8cd	Add 'df_name_compare()' helper function This new helper is identical to base_name_compare(), except it compares conflicting directory/file entries as equal in order to help handling DF conflicts (thus the name). Note that while a directory name compares as equal to a regular file with the new helper, they then individually compare _differently_ to a filename that has a dot after the basename (because '\0' < '.' < '/'). So a directory called "foo/" will compare equal to a file "foo", even though "foo.c" will compare after "foo" and before "foo/" This will be used by routines that want to traverse the git namespace but then handle conflicting entries together when possible. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Linus Torvalds	d070e3a31b	Name hash fixups: export (and rename) remove_hash_entry This makes the name hash removal function (which really just sets the bit that disables lookups of it) available to external routines, and makes read_cache_unmerged() use it when it drops an unmerged entry from the index. It's renamed to remove_index_entry(), and we drop the (unused) 'istate' argument. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Linus Torvalds	a22c637124	Fix name re-hashing semantics We handled the case of removing and re-inserting cache entries badly, which is something that merging commonly needs to do (removing the different stages, and then re-inserting one of them as the merged state). We even had a rather ugly special case for this failure case, where replace_index_entry() basically turned itself into a no-op if the new and the old entries were the same, exactly because the hash routines didn't handle it on their own. So what this patch does is to not just have the UNHASHED bit, but a HASHED bit too, and when you insert an entry into the name hash, that involves: - clear the UNHASHED bit, because now it's valid again for lookup (which is really all that UNHASHED meant) - if we're being lazy, we're done here (but we still want to clear the UNHASHED bit regardless of lazy mode, since we can become unlazy later, and so we need the UNHASHED bit to always be set correctly, even if we never actually insert the entry into the hash list) - if it was already hashed, we just leave it on the list - otherwise mark it HASHED and insert it into the list this all means that unhashing and rehashing a name all just works automatically. Obviously, you cannot change the name of an entry (that would be a serious bug), but nothing can validly do that anyway (you'd have to allocate a new struct cache_entry anyway since the name length could change), so that's not a new limitation. The code actually gets simpler in many ways, although the lazy hashing does mean that there are a few odd cases (ie something can be marked unhashed even though it was never on the hash in the first place, and isn't actually marked hashed!). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Daniel Barkalow	94a5728cfb	Library function to check for unmerged index entries It's small, but it was in three places already, so it should be in the library. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>	17 years ago
Junio C Hamano	9cb76b8cdc	lazy index hashing This delays the hashing of index names until it becomes necessary for the first time. Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Linus Torvalds	cf558704fb	Create pathname-based hash-table lookup into index This creates a hash index of every single file added to the index. Right now that hash index isn't actually used for much: I implemented a "cache_name_exists()" function that uses it to efficiently look up a filename in the index without having to do the O(logn) binary search, but quite frankly, that's not why this patch is interesting. No, the whole and only reason to create the hash of the filenames in the index is that by modifying the hash function, you can fairly easily do things like making it always hash equivalent names into the same bucket. That, in turn, means that suddenly questions like "does this name exist in the index under an _equivalent_ name?" becomes much much cheaper. Guiding principles behind this patch: - it shouldn't be too costly. In fact, my primary goal here was to actually speed up "git commit" with a fully populated kernel tree, by being faster at checking whether a file already existed in the index. I did succeed, but only barely: Best before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.255s user 0m0.168s sys 0m0.088s Best after: [torvalds@woody linux]$ time ~/git/git commit > /dev/null real 0m0.233s user 0m0.144s sys 0m0.088s so some things are actually faster (~8%). Caveat: that's really the best case. Other things are invariably going to be slightly slower, since we populate that index cache, and quite frankly, few things really use it to look things up. That said, the cost is really quite small. The worst case is probably doing a "git ls-files", which will do very little except puopulate the index, and never actually looks anything up in it, just lists it. Before: [torvalds@woody linux]$ time git ls-files > /dev/null real 0m0.016s user 0m0.016s sys 0m0.000s After: [torvalds@woody linux]$ time ~/git/git ls-files > /dev/null real 0m0.021s user 0m0.012s sys 0m0.008s and while the thing has really gotten relatively much slower, we're still talking about something almost unmeasurable (eg 5ms). And that really should be pretty much the worst case. So we lose 5ms on one "benchmark", but win 22ms on another. Pick your poison - this patch has the advantage that it will _likely_ speed up the cases that are complex and expensive more than it slows down the cases that are already so fast that nobody cares. But if you look at relative speedups/slowdowns, it doesn't look so good. - It should be simple and clean The code may be a bit subtle (the reasons I do hash removal the way I do etc), but it re-uses the existing hash.c files, so it really is fairly small and straightforward apart from a few odd details. Now, this patch on its own doesn't really do much, but I think it's worth looking at, if only because if done correctly, the name hashing really can make an improvement to the whole issue of "do we have a filename that looks like this in the index already". And at least it gets real testing by being used even by default (ie there is a real use-case for it even without any insane filesystems). NOTE NOTE NOTE! The current hash is a joke. I'm ashamed of it, I'm just not ashamed of it enough to really care. I took all the numbers out of my nether regions - I'm sure it's good enough that it works in practice, but the whole point was that you can make a really much fancier hash that hashes characters not directly, but by their upper-case value or something like that, and thus you get a case-insensitive hash, while still keeping the name and the index itself totally case sensitive. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	6d91da6d3c	read-cache.c: introduce is_racy_timestamp() helper This moves a common boolean expression into a helper function, and makes the comparison between filesystem timestamp and index timestamp done in the function in line with the other places. st.st_mtime should be casted to (unsigned int) when compared to an index timestamp ce_mtime. Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	077c48df8a	read-cache.c: fix a couple more CE_REMOVE conversion It is a D/F conflict if you want to add "foo/bar" to the index when "foo" already exists. Also it is a conflict if you want to add a file "foo" when "foo/bar" exists. An exception is when the existing entry is there only to mark "I used to be here but I am being removed". This is needed for operations such as "git read-tree -m -u" that update the index and then reflect the result to the work tree --- we need to remember what to remove somewhere, and we use the index for that. In such a case, an existing file "foo" is being removed and we can create "foo/" directory and hang "bar" underneath it without any conflict. We used to use (ce->ce_mode == 0) to mark an entry that is being removed, but (CE_REMOVE & ce->ce_flags) is used for that purpose these days. An earlier commit forgot to convert the logic in the code that checks D/F conflict condition. The old code knew that "to be removed" entries cannot be at higher stage and actively checked that condition, but it was an unnecessary check. This patch removes the extra check as well. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	eadb583134	Avoid running lstat(2) on the same cache entry. Aside from the lstat(2) done for work tree files, there are quite many lstat(2) calls in refname dwimming codepath. This patch is not about reducing them. * It adds a new ce_flag, CE_UPTODATE, that is meant to mark the cache entries that record a regular file blob that is up to date in the work tree. If somebody later walks the index and wants to see if the work tree has changes, they do not have to be checked with lstat(2) again. * fill_stat_cache_info() marks the cache entry it just added with CE_UPTODATE. This has the effect of marking the paths we write out of the index and lstat(2) immediately as "no need to lstat -- we know it is up-to-date", from quite a lot fo callers: - git-apply --index - git-update-index - git-checkout-index - git-add (uses add_file_to_index()) - git-commit (ditto) - git-mv (ditto) * refresh_cache_ent() also marks the cache entry that are clean with CE_UPTODATE. * write_index is changed not to write CE_UPTODATE out to the index file, because CE_UPTODATE is meant to be transient only in core. For the same reason, CE_UPDATE is not written to prevent an accident from happening. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	17 years ago
Junio C Hamano	7fec10b7f4	index: be careful when handling long names We currently use lower 12-bit (masked with CE_NAMEMASK) in the ce_flags field to store the length of the name in cache_entry, without checking the length parameter given to create_ce_flags(). This can make us store incorrect length. Currently we are mostly protected by the fact that many codepaths first copy the path in a variable of size PATH_MAX, which typically is 4096 that happens to match the limit, but that feels like a bug waiting to happen. Besides, that would not allow us to shorten the width of CE_NAMEMASK to use the bits for new flags. This redefines the meaning of the name length stored in the cache_entry. A name that does not fit is represented by storing CE_NAMEMASK in the field, and the actual length needs to be computed by actually counting the bytes in the name[] field. This way, only the unusually long paths need to suffer. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	17 years ago
Linus Torvalds	7a51ed66f6	Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	17 years ago
Junio C Hamano	fb63d7f889	git-add: make the entry stat-clean after re-adding the same contents Earlier in commit `0781b8a9b2` (add_file_to_index: skip rehashing if the cached stat already matches), add_file_to_index() were taught not to re-add the path if it already matches the index. The change meant well, but was not executed quite right. It used ie_modified() to see if the file on the work tree is really different from the index, and skipped adding the contents if the function says "not modified". This was wrong. There are three possible comparison results between the index and the file in the work tree: - with lstat(2) we _know_ they are different. E.g. if the length or the owner in the cached stat information is different from the length we just obtained from lstat(2), we can tell the file is modified without looking at the actual contents. - with lstat(2) we _know_ they are the same. The same length, the same owner, the same everything (but this has a twist, as described below). - we cannot tell from lstat(2) information alone and need to go to the filesystem to actually compare. The last case arises from what we call 'racy git' situation, that can be caused with this sequence: $ echo hello >file $ git add file $ echo aeiou >file ;# the same length If the second "echo" is done within the same filesystem timestamp granularity as the first "echo", then the timestamp recorded by "git add" and the timestamp we get from lstat(2) will be the same, and we can mistakenly say the file is not modified. The path is called 'racily clean'. We need to reliably detect racily clean paths are in fact modified. To solve this problem, when we write out the index, we mark the index entry that has the same timestamp as the index file itself (that is the time from the point of view of the filesystem) to tell any later code that does the lstat(2) comparison not to trust the cached stat info, and ie_modified() then actually goes to the filesystem to compare the contents for such a path. That's all good, but it should not be used for this "git add" optimization, as the goal of "git add" is to actually update the path in the index and make it stat-clean. With the false optimization, we did _not_ cause any data loss (after all, what we failed to do was only to update the cached stat information), but it made the following sequence leave the file stat dirty: $ echo hello >file $ git add file $ echo hello >file ;# the same contents $ git add file The solution is not to use ie_modified() which goes to the filesystem to see if it is really clean, but instead use ie_match_stat() with "assume racily clean paths are dirty" option, to force re-adding of such a path. There was another problem with "git add -u". The codepath shares the same issue when adding the paths that are found to be modified, but in addition, it asked "git diff-files" machinery run_diff_files() function (which is "git diff-files") to list the paths that are modified. But "git diff-files" machinery uses the same ie_modified() call so that it does not report racily clean _and_ actually clean paths as modified, which is not what we want. The patch allows the callers of run_diff_files() to pass the same "assume racily clean paths are dirty" option, and makes "git-add -u" codepath to use that option, to discover and re-add racily clean _and_ actually clean paths. We could further optimize on top of this patch to differentiate the case where the path really needs re-adding (i.e. the content of the racily clean entry was indeed different) and the case where only the cached stat information needs to be refreshed (i.e. the racily clean entry was actually clean), but I do not think it is worth it. This patch applies to maint and all the way up. Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Junio C Hamano	4bd5b7dacc	ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com>	17 years ago
Linus Torvalds	cd8ae20195	git-blame shouldn't crash if run in an unmerged tree If we are in the middle of resolving a merge conflict there may be one or more files whose entries in the index represent an unmerged state (index entries in the higher-order stages). Attempting to run git-blame on any file in such a working directory resulted in "fatal: internal error: ce_mode is 0" as we use the magic marker for an unmerged entry is 0 (set up by things like diff-lib.c's do_diff_cache() and builtin-read-tree.c's read_tree_unmerged()) and the ce_match_stat_basic() function gets upset about this. I'm not entirely sure that the whole "ce_mode = 0" case is a good idea to begin with, and maybe the right thing to do is to remove that horrid freakish special case, but removing the internal error seems to be the simplest fix for now. Linus [sp: Thanks to Björn Steinbrink for the test case] Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	17 years ago

1 2 3 4 5

208 Commits (b92891f9783ae197bb84b90d8404ad08c3875fa1)