Teach oid_object_info_extended() to support a new flag that inhibits
fetching of missing objects. This is equivalent to setting
fetch_is_missing to 0, calling oid_object_info_extended(), then setting
fetch_if_missing to whatever it was before. Update unpack-trees.c to use
this new flag instead of repeatedly setting fetch_if_missing.
This new flag complicates things slightly in that there are now 2 ways
to do the same thing. But this eliminates the need to repeatedly set a
global variable, and more importantly, allows prefetching to be done in
parallel (in the future); hence, this patch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The gently flag was added in 17e4642667 (Add flag to make unpack_trees()
not print errors. - 2008-02-07) to suppress error messages. The name
"gently" does not quite express that. Granted, being quiet is gentle but
it could mean not performing some other actions. Rename the flag to
"quiet" to be more on point.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This basically follows the footsteps of 6a143aa2b2 (checkout -m:
attempt merge when deletion of path was staged - 2014-08-12) where
there gently check is moved inside reject_merge() so that callers do
not accidentally forget it.
add_rejected_path() has the same usage pattern. All call sites check
gently first, then decide to call add_rejected_path() if needed. Move
the check inside.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip found out that 'git checkout -f <branch>' does not restore
conflict/unmerged files correctly. All tracked files should be taken
from <branch> and all non-zero stages removed. Most of this is true,
except that the final file could be in stage one instead of zero.
"checkout -f" (among other commands) does this with one-way merge, which
is supposed to take stat info from the index and everything else from
the given tree. The add_entry(.., old, ...) call in oneway_merge()
though will keep stage index from the index.
This is normally not a problem if the entry from the index is
normal (stage #0
entry as "old" variable, which gets recorded in
the final index. Fix it by clearing stage mask.
This bug probably comes from b5b425074e (git-read-tree: make one-way
merge also honor the "update" flag, 2005-06-07). Before this commit, we
may create the final ("dst") index entry from the one in index, but we
do clear CE_STAGEMASK.
I briefly checked two- and three-way merge functions. I think we don't
have the same problem in those.
Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The verify_clean_subdirectory() helper takes an error_type parameter
from the caller, but doesn't actually use it. Instead, when it calls
add_rejected_path() it passes NOT_UPTODATE_DIR, its own custom error
type which is more specific than what the caller provides. Likewise for
verify_clean_submodule(), which always passes WOULD_LOSE_SUBMODULE.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We pull the names from the existing index rather than the tree entry,
which is after all the point of this function. Let's drop the unused
"names" parameter.
Note that we leave the "nr_names" parameter, as it tells us how many
trees we are traversing (and thus how many index stages to set up).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a post-index-change hook that is invoked after the index is written in
do_write_locked_index().
This hook is meant primarily for notification, and cannot affect
the outcome of git commands that trigger the index write.
The hook is passed a flag to indicate whether the working directory was
updated or not and a flag indicating if a skip-worktree bit could have
changed. These flags enable the hook to optimize its response to the
index change notification.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, index compat macros are off from now on, because they
could hide the_index dependency.
Only those in builtin can use it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing a tree, we read the object ID directly out of the tree
buffer. This is normally fine, but such an object ID cannot be used with
oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1,
there may not be that many bytes to copy.
Instead, store the object ID in a separate struct member. Since we can
no longer efficiently compute the path length, store that information as
well in struct name_entry. Ensure we only copy the object ID into the
new buffer if the path length is nonzero, as some callers will pass us
an empty path with no object ID following it, and we will not want to
read past the end of the buffer.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When marking cache entries for removal, and later removing them all at
once using 'remove_marked_cache_entries()', cache entries currently
have to be invalidated manually in the cache tree and in the untracked
cache.
Add an invalidate flag to the function. With the flag set, the
function will take care of invalidating the path in the cache tree and
in the untracked cache.
Note that the current callsites already do the invalidation properly
in other places, so we're just passing 0 from there to keep the status
quo.
This will be useful in a subsequent commit.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Factor out the 'unlink_entry()' function from unpack-trees.c to
entry.c. It will be used in other places as well in subsequent
steps.
As it's no longer a static function, also move the documentation to
the header file to make it more discoverable.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to support :(attr) when matching pathspec on a tree,
tree_entry_interesting() needs to take an index (because
git_check_attr() needs it). This is the preparation step for it. This
also makes it clearer what index we fall back to when looking up
attributes during an unpack-trees operation: the source index.
This also fixes revs->pruning.repo initialization that should have
been done in 2abf350385 (revision.c: remove implicit dependency on
the_index - 2018-09-21). Without it, skip_uninteresting() will
dereference a NULL pointer through this call chain
get_revision(revs)
get_revision_internal
get_revision_1
try_to_simplify_commit
rev_compare_tree
diff_tree_oid(..., &revs->pruning)
ll_diff_tree_oid
diff_tree_paths
ll_diff_tree
skip_uninteresting
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the problems with "git checkout" is that it does so many
different things and could confuse people specially when we fail to
handle ambiguation correctly.
One way to help with that is tell the user what sort of operation is
actually carried out. When switching branches, we always print
something unless --quiet, either
- "HEAD is now at ..."
- "Reset branch ..."
- "Already on ..."
- "Switched to and reset ..."
- "Switched to a new branch ..."
- "Switched to branch ..."
Checking out paths however is silent. Print something so that if we
got the user intention wrong, they won't waste too much time to find
that out. For the remaining cases of checkout we now print either
- "Checked out ... paths out of the index"
- "Checked out ... paths out of <abbrev hash>"
Since the purpose of printing this is to help disambiguate. Only do it
when "--" is missing.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This case is more interesting than other boring "remove the_repo"
commits because while we need access to the object database, we cannot
simply use r->index because unpack-trees.c can operate on a temporary
index, not $GIT_DIR/index. Ideally we should be able to pass an object
database to lookup_tree() but that ship has sailed.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
it is unconditionally initialized a few lines below
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are fetch_object() and fetch_objects() helpers in
fetch-object.h; as the latter takes "struct oid_array",
the former cannot be made into a thin wrapper around the
latter without an extra allocation and set-up cost.
Update fetch_objects() to take an array of "struct object_id"
and number of elements in it as separate parameters, remove
fetch_object(), and adjust all existing callers of these
functions to use the new fetch_objects().
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the more restrictive oideq() should, in the long run,
give the compiler more opportunities to optimize these
callsites. For now, this conversion should be a complete
noop with respect to the generated code.
The result is also perhaps a little more readable, as it
avoids the "zero is equal" idiom. Since it's so prevalent in
C, I think seasoned programmers tend not to even notice it
anymore, but it can sometimes make for awkward double
negations (e.g., we can drop a few !!oidcmp() instances
here).
This patch was generated almost entirely by the included
coccinelle patch. This mechanical conversion should be
completely safe, because we check explicitly for cases where
oidcmp() is compared to 0, which is what oideq() is doing
under the hood. Note that we don't have to catch "!oidcmp()"
separately; coccinelle's standard isomorphisms make sure the
two are treated equivalently.
I say "almost" because I did hand-edit the coccinelle output
to fix up a few style violations (it mostly keeps the
original formatting, but sometimes unwraps long lines).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix an incorrect comment in the new code added in b4da37380b
(unpack-trees: optimize walking same trees with cache-tree -
2018-08-18) and document about the new test variable that is enabled
by default in test-lib.sh in 4592e6080f (cache-tree: verify valid
cache-tree in the test suite - 2018-08-18)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes sure that cache-tree is consistent with the index. The main
purpose is to catch potential problems by saving the index in
unpack_trees() but the line in write_index() would also help spot
missing invalidation in other code.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Any changes to the output index should be (confusingly) marked in the
source index with invalidate_ce_path(). This is used to make sure we
still have valid untracked cache and cache-tree extensions in the end.
We do a pretty good job of invalidating except in two places.
verify_clean_subdirectory() is part of verify_absent() and
verify_absent_sparse(). The former is usually called by merged_entry()
or directly in threeway_merge(). The latter is obviously used by
sparse checkout.
In these three call sites, only merged_entry() follows up with
invalidate_ce_path(). The other two don't, but they should not trigger
this ce removal because this is about D/F conflicts [1]. But let's be
safe and invalidate_ce_path() here as well.
The second place is keep_entry() which is also used by threeway_merge()
to keep higher stage entries. In order to reuse cache-tree we need to
invalidate these paths as well. It's not a problem in the past because
whenever a higher stage entry is present, cache-tree will not be
created [2]. Now we salvage cache-tree even when higher stage entries
are present, we need more invalidation.
[1] c81935348b (Fix switching to a branch with D/F when current branch
has file D. - 2007-03-15)
[2] This is probably too strict. We should be able to create and save
cache-tree for the directories that do not have conflict entries
in cache_tree_update(). And this becomes more important when
cache-tree plays bigger role in terms of performance.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do n-way merge by walking the source index and n trees at the same
time and add merge results to a new temporary index called o->result.
The merge result for any given path could be either
- keep_entry(): same old index entry in o->src_index is reused
- merged_entry(): either a new entry is added, or an existing one updated
- deleted_entry(): one entry from o->src_index is removed
For some reason [1] we keep making sure that the source index's
cache-tree is still valid if used by o->result: for all those
merged/deleted entries, we invalidate the same path in o->src_index,
so only cache-trees covering the "keep_entry" parts remain good.
Because of this, the cache-tree from o->src_index can be perfectly
reused in o->result. And in fact we already rely on this logic to
reuse untracked cache in edf3b90553 (unpack-trees: preserve index
extensions - 2017-05-08). Move the cache-tree to o->result before
doing cache_tree_update() to reduce hashing cost.
Since cache_tree_update() has risen up as one of the most expensive
parts in unpack_trees() after the last few patches. This does help
reduce unpack_trees() time significantly (on webkit.git):
before after
--------------------------------------------------------------------
0.080394752 0.051258167 s: read cache .git/index
0.216010838 0.212106298 s: preload index
0.008534301 0.280521764 s: refresh index
0.251992198 0.218160442 s: traverse_trees
0.377031383 0.374948191 s: check_updates
0.372768105 0.037040114 s: cache_tree_update
1.045887251 0.672031609 s: unpack_trees
0.314983512 0.317456290 s: write index, changed mask = 2e
0.062572653 0.038382654 s: traverse_trees
0.000022544 0.000042731 s: check_updates
0.073795585 0.050930053 s: unpack_trees
0.073807557 0.051099735 s: diff-index
1.938191592 1.614241153 s: git command: git checkout -
[1] I'm pretty sure the reason is an oversight in 34110cd4e3 (Make
'unpack_trees()' have a separate source and destination index -
2008-03-06). That patch aims to _not_ update the source index at
all. The invalidation should have been done on o->result in that
patch. But then there was no cache-tree on o->result even then so
it's pointless to do so.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a micro optimization that probably only shines on repos with
deep directory structure. Instead of allocating and freeing a new
cache_entry in every iteration, we reuse the last one and only update
the parts that are new each iteration.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to merge one or many trees with the index, unpack-trees code
walks multiple trees in parallel with the index and performs n-way
merge. If we find out at start of a directory that all trees are the
same (by comparing OID) and cache-tree happens to be available for
that directory as well, we could avoid walking the trees because we
already know what these trees contain: it's flattened in what's called
"the index".
The upside is of course a lot less I/O since we can potentially skip
lots of trees (think subtrees). We also save CPU because we don't have
to inflate and apply the deltas. The downside is of course more
fragile code since the logic in some functions are now duplicated
elsewhere.
"checkout -" with this patch on webkit.git (275k files):
baseline new
--------------------------------------------------------------------
0.056651714 0.080394752 s: read cache .git/index
0.183101080 0.216010838 s: preload index
0.008584433 0.008534301 s: refresh index
0.633767589 0.251992198 s: traverse_trees
0.340265448 0.377031383 s: check_updates
0.381884638 0.372768105 s: cache_tree_update
1.401562947 1.045887251 s: unpack_trees
0.338687914 0.314983512 s: write index, changed mask = 2e
0.411927922 0.062572653 s: traverse_trees
0.000023335 0.000022544 s: check_updates
0.423697246 0.073795585 s: unpack_trees
0.423708360 0.073807557 s: diff-index
2.559524127 1.938191592 s: git command: git checkout -
Another measurement from Ben's running "git checkout" with over 500k
trees (on the whole series):
baseline new
----------------------------------------------------------------------
0.535510167 0.556558733 s: read cache .git/index
0.3057373 0.3147105 s: initialize name hash
0.0184082 0.023558433 s: preload index
0.086910967 0.089085967 s: refresh index
7.889590767 2.191554433 s: unpack trees
0.120760833 0.131941267 s: update worktree after a merge
2.2583504 2.572663167 s: repair cache-tree
0.8916137 0.959495233 s: write index, changed mask = 28
3.405199233 0.2710663 s: unpack trees
0.000999667 0.0021554 s: update worktree after a merge
3.4063306 0.273318333 s: diff-index
16.9524923 9.462943133 s: git command: git.exe checkout
This command calls unpack_trees() twice, the first time on 2way merge
and the second 1way merge. In both times, "unpack trees" time is
reduced to one third. Overall time reduction is not that impressive of
course because index operations take a big chunk. And there's that
repair cache-tree line.
PS. A note about cache-tree invalidation and the use of it in this
code.
We do invalidate cache-tree in _source_ index when we add new entries
to the (temporary) "result" index. But we also use the cache-tree from
source index in this optimization. Does this mean we end up having no
cache-tree in the source index to activate this optimization?
The answer is twisted: the order of finding a good cache-tree and
invalidating it matters. In this case we check for a good cache-tree
first in all_trees_same_as_cache_tree(), then we start to merge things
and potentially invalidate that same cache-tree in the process. Since
cache-tree invalidation happens after the optimization kicks in, we're
still good. But we may lose that cache-tree at the very first
call_unpack_fn() call in traverse_by_cache_tree().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're going to optimize unpack_trees() a bit in the following
patches. Let's add some tracing to measure how long it takes before
and after. This is the baseline ("git checkout -" on webkit.git, 275k
files on worktree)
performance: 0.056651714 s: read cache .git/index
performance: 0.183101080 s: preload index
performance: 0.008584433 s: refresh index
performance: 0.633767589 s: traverse_trees
performance: 0.340265448 s: check_updates
performance: 0.381884638 s: cache_tree_update
performance: 1.401562947 s: unpack_trees
performance: 0.338687914 s: write index, changed mask = 2e
performance: 0.411927922 s: traverse_trees
performance: 0.000023335 s: check_updates
performance: 0.423697246 s: unpack_trees
performance: 0.423708360 s: diff-index
performance: 2.559524127 s: git command: git checkout -
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".
This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.
In the case that we can't rely on filesystem (via inode number) to do
this check, fall back to fspathcmp() which is not perfect but should
not give false positives.
This patch is tested with vim-colorschemes and Sublime-Gitignore
repositories on a JFS partition with case insensitive support on
Linux.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since attr checking API now take the index, there's no need to set an
index in advance with this call. Most call sites are straightforward
because they either pass the_index or NULL (which defaults back to
the_index previously). There's only one suspicious call site in
unpack-trees.c where it sets a different index.
This code in unpack-trees is about to check out entries from the
new/temporary index after merging is done in it. The attributes will
be used by entry.c code to do crlf conversion if needed. entry.c now
respects struct checkout's istate field, and this field is correctly
set in unpack-trees.c, there should be no regression from this change.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both functions that are updated in this commit are called by
verify_absent(), which is part of the "unpack-trees" operation that is
supposed to work on any index file specified by the caller. Thanks to
Brandon [1] [2], an implicit dependency on the_index is exposed. This
commit fixes it.
In both functions, it makes sense to use src_index to check for
exclusion because it's almost unchanged and should give us the same
outcome as if running the exclude check before the unpack.
It's "almost unchanged" because we do invalidate cache-tree and
untracked cache in the source index. But this should not affect how
exclude machinery uses the index: to see if a file is tracked, and to
read a blob from the index instead of worktree if it's marked
skip-worktree (i.e. it's not available in worktree)
[1] a0bba65b10 (dir: convert is_excluded to take an index - 2017-05-05
[2] 2c1eb10454 (dir: convert read_directory to take an index - 2017-05-05)
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prior to fba92be8f7, this code implicitly (and incorrectly) assumes
the_index when running the exclude machinery. fba92be8f7 helps show
this problem clearer because unpack-trees operation is supposed to
work on whatever index the caller specifies... not specifically
the_index.
Update the code to use "istate" argument that's originally from
mark_new_skip_worktree(). From the call sites, both in unpack_trees(),
you can see that this function works on two separate indexes:
o->src_index and o->result. The second mark_new_skip_worktree() so far
has incorecctly applied exclude rules on o->src_index instead of
o->result. It's unclear what is the consequences of this, but it's
definitely wrong.
[1] fba92be8f7 (dir: convert is_excluded_from_list to take an index -
2017-05-05)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function mark_new_skip_worktree() has an argument named the_index
which is also the name of a global variable. While they have different
types (the global the_index is not a pointer) mistakes can easily
happen and it's also confusing for readers. Rename the function
argument to something other than the_index.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After modify/delete merge conflict happens in a file skipped by sparse
checkout, "git reset --merge", which implements the "--abort" actions,
and "git reset --hard" fail with message "Entry * not uptodate. Cannot
update sparse checkout."
As explained in [1], the up-to-date checker mistakenly treats conflicted
entry which does not exist in HEAD as still skipped by sparse checkout.
Use the fix suggested in [1]. Also, add test case which verifies the
issue is fixed.
[1] https://public-inbox.org/git/20180616051444.GA29754@duynguyen.home/
Signed-off-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Max Kirillov <max@max630.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.
This change moves the cache entry allocation to be on top of memory
pools.
Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.
In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:
Terminology:
- 'the_index': represents the logical view of the index
- 'split_index': represents the "base" cache entries. Read from the
split index file.
'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is. This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:
Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.
Alternatives:
1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
Pro: No extra memory overhead to track this state
Con: Extra complexity in callers to handle this correctly.
The extra complexity and burden to not regress this behavior in the
future was more than we wanted.
2) cache_entry would gain knowledge about which mem_pool allocated it
Pro: Could (potentially) do extra logic to know when a mem_pool no
longer had references to any cache_entry
Con: cache_entry would grow heavier by a pointer, instead of int
We didn't see a tangible benefit to this approach
3) Do not add any extra information to a cache_entry, but when freeing a
cache entry, check if the memory exists in a region managed by existing
mem_pools.
Pro: No extra memory overhead to track state
Con: Extra computation is performed when freeing cache entries
We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It has been observed that the time spent loading an index with a large
number of entries is partly dominated by malloc() calls. This change
is in preparation for using memory pools to reduce the number of
malloc() calls made to allocate cahce entries when loading an index.
Add an API to allocate and discard cache entries, abstracting the
details of managing the memory backing the cache entries. This commit
does actually change how memory is managed - this will be done in a
later commit in the series.
This change makes the distinction between cache entries that are
associated with an index and cache entries that are not associated with
an index. A main use of cache entries is with an index, and we can
optimize the memory management around this. We still have other cases
where a cache entry is not persisted with an index, and so we need to
handle the "transient" use case as well.
To keep the congnitive overhead of managing the cache entries, there
will only be a single discard function. This means there must be enough
information kept with the cache entry so that we know how to discard
them.
A summary of the main functions in the API is:
make_cache_entry: create cache entry for use in an index. Uses specified
parameters to populate cache_entry fields.
make_empty_cache_entry: Create an empty cache entry for use in an index.
Returns cache entry with empty fields.
make_transient_cache_entry: create cache entry that is not used in an
index. Uses specified parameters to populate
cache_entry fields.
make_empty_transient_cache_entry: create cache entry that is not used in
an index. Returns cache entry with
empty fields.
discard_cache_entry: A single function that knows how to discard a cache
entry regardless of how it was allocated.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The strings allocated in `setup_unpack_trees_porcelain()` are never
freed. Provide a function `clear_unpack_trees_porcelain()` to do so and
call it where we use `setup_unpack_trees_porcelain()`. The only
non-trivial user is `unpack_trees_start()`, where we should place the
new call in `unpack_trees_finish()`.
We keep the string pointers in an array, mixing pointers to static
memory and memory that we allocate on the heap. We also keep several
copies of the individual pointers. So we need to make sure that we do
not free what we must not free and that we do not double-free. Let a
separate argv_array take ownership of all the strings we create so that
we can easily free them.
Zero the whole array of string pointers to make sure that we do not
leave any dangling pointers.
Note that we only take responsibility for the memory allocated in
`setup_unpack_trees_porcelain()` and not any other members of the
`struct unpack_trees_options`.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This should make these functions easier to find and cache.h less
overwhelming to read.
In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file
As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise. It would be better to #include wherever
identifiers from the header are used. That can happen later
when we have better tooling for it.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes an issue that existed before my directory rename detection
patches that affects both normal renames and renames implied by
directory rename detection. Additional codepaths that only affect
overwriting of dirty files that are involved in directory rename
detection will be added in a subsequent commit.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In d8193743e0 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae5
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).
The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.
Let's just convert all remaining ones in one fell swoop.
This trick was performed by this invocation:
sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adjust struct index_state to use struct object_id instead of unsigned
char [20].
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, all callers of unpack_trees() set o->src_index == o->dst_index.
The code in unpack_trees() does not correctly handle them being different.
There are two separate issues:
First, there is the possibility of memory corruption. Since
unpack_trees() creates a temporary index in o->result and then discards
o->dst_index and overwrites it with o->result, in the special case that
o->src_index == o->dst_index, it is safe to just reuse o->src_index's
split_index for o->result. However, when src and dst are different,
reusing o->src_index's split_index for o->result will cause the
split_index to be shared. If either index then has entries replaced or
removed, it will result in the other index referring to free()'d memory.
Second, we can drop the index extensions. Previously, we were moving
index extensions from o->dst_index to o->result. Since o->src_index is
the one that will have the necessary extensions (o->dst_index is likely to
be a new index temporary index created to store the results), we should be
moving the index extensions from there.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit e4bb62fa1e, reversing
changes made to 468165c1d8.
The topic appears to inflict severe regression in renaming merges,
even though the promise of it was that it would improve them.
We do not yet know which exact change in the topic was wrong, but in
the meantime, let's play it safe and revert it out of 'master'
before real Git-using projects are harmed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At some point we may want to rename the function so that it describes what
it actually does as 'submodule_free' doesn't quite describe that this
clears a repository's submodule cache. But that's beyond the scope of
this series.
While at it remove the extern key word from its declaration.
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes an issue that existed before my directory rename detection
patches that affects both normal renames and renames implied by
directory rename detection. Additional codepaths that only affect
overwriting of dirty files that are involved in directory rename
detection will be added in a subsequent commit.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_directory() code ignores all paths named ".git" even if it's not
a valid git repository. See treat_path() for details. Since ".git" is
basically invisible to read_directory(), when we are asked to
invalidate a path that contains ".git", we can safely ignore it
because the slow path would not consider it anyway.
This helps when fsmonitor is used and we have a real ".git" repo at
worktree top. Occasionally .git/index will be updated and if the
fsmonitor hook does not filter it, untracked cache is asked to
invalidate the path ".git/index".
Without this patch, we invalidate the root directory unncessarily,
which:
- makes read_directory() fall back to slow path for root directory
(slower)
- makes the index dirty (because UNTR extension is updated). Depending
on the index size, writing it down could also be slow.
A note about the new "safe_path" knob. Since this new check could be
relatively expensive, avoid it when we know it's not needed. If the
path comes from the index, it can't contain ".git". If it does
contain, we may be screwed up at many more levels, not just this one.
Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there is a one way merge, each submodule needs to be one way merged
as well, if we're asked to recurse into submodules.
In case of a submodule, check if it is up-to-date, otherwise set the
flag CE_UPDATE, which will trigger an update of it in the phase updating
the tree later.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running checkout, first prefetch all blobs that are to be updated
but are missing. This means that only one pack is downloaded during such
operations, instead of one per missing blob.
This operates only on the blob level - if a repository has a missing
tree, they are still fetched one at a time.
This does not use the delayed checkout mechanism introduced in commit
2841e8f ("convert: add "status=delayed" to filter process protocol",
2017-06-30) due to significant conceptual differences - in particular,
for partial clones, we already know what needs to be fetched based on
the contents of the local repo alone, whereas for status=delayed, it is
the filter process that tells us what needs to be checked in the end.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the declaration and definition of resolve_gitlink_ref to use
struct object_id and apply the following semantic patch:
@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3.hash)
+ resolve_gitlink_ref(E1, E2, &E3)
@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3->hash)
+ resolve_gitlink_ref(E1, E2, E3)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the index is read from disk, the fsmonitor index extension is used
to flag the last known potentially dirty index entries. The registered
core.fsmonitor command is called with the time the index was last
updated and returns the list of files changed since that time. This list
is used to flag any additional dirty cache entries and untracked cache
directories.
We can then use this valid state to speed up preload_index(),
ie_match_stat(), and refresh_cache_ent() as they do not need to lstat()
files to detect potential changes for those entries marked
CE_FSMONITOR_VALID.
In addition, if the untracked cache is turned on valid_cached_dir() can
skip checking directories for new or changed files as fsmonitor will
invalidate the cache only for those directories that have been
identified as having potential changes.
To keep the CE_FSMONITOR_VALID state accurate during git operations;
when git updates a cache entry to match the current state on disk,
it will now set the CE_FSMONITOR_VALID bit.
Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit
is cleared and the corresponding untracked cache directory is marked
invalid.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>