Commit Graph

27 Commits (2197f879f2f4821ed734fef8aff13a8fc49720c0)

Author SHA1 Message Date
Jeff King 5bf7f1eaa5 list-objects-filter: treat NULL filter_options as "disabled"
In most callers, we have an actual list_objects_filter_options struct,
and if no filtering is desired its "choice" element will be
LOFC_DISABLED. However, some code may have only a pointer to such a
struct which may be NULL (because _their_ callers didn't care about
filtering, either). Rather than forcing them to handle this explicitly
like:

  if (filter_options)
          traverse_commit_list_filtered(filter_options, revs,
	                                show_commit, show_object,
					show_data, NULL);
  else
          traverse_commit_list(revs, show_commit, show_object,
	                             show_data);

let's just treat a NULL filter_options the same as LOFC_DISABLED. We
only need a small change, since that option struct is converted into a
real filter only in the "init" function.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-04 21:57:58 -07:00
Junio C Hamano ad8f0368b4 Merge branch 'jk/partial-clone-sparse-blob'
The name of the blob object that stores the filter specification
for sparse cloning/fetching was interpreted in a wrong place in the
code, causing Git to abort.

* jk/partial-clone-sparse-blob:
  list-objects-filter: use empty string instead of NULL for sparse "base"
  list-objects-filter: give a more specific error sparse parsing error
  list-objects-filter: delay parsing of sparse oid
  t5616: test cloning/fetching with sparse:oid=<oid> filter
2019-10-07 11:32:54 +09:00
Jeff King a4cafc7379 list-objects-filter: use empty string instead of NULL for sparse "base"
We use add_excludes_from_blob_to_list() to parse a sparse blob. Since
we don't have a base path, we pass NULL and 0 for the base and baselen,
respectively. But the rest of the exclude code passes a literal empty
string instead of NULL for this case. And indeed, we eventually end up
with match_pathname() calling fspathncmp(), which then calls the system
strncmp(path, base, baselen).

This works on many platforms, which notice that baselen is 0 and do not
look at the bytes of "base" at all. But it does violate the C standard,
and building with SANITIZE=undefined will complain. You can also see it
by instrumenting fspathncmp like this:

	diff --git a/dir.c b/dir.c
	index d021c908e5..4bb3d3ec96 100644
	--- a/dir.c
	+++ b/dir.c
	@@ -71,6 +71,8 @@ int fspathcmp(const char *a, const char *b)

	 int fspathncmp(const char *a, const char *b, size_t count)
	 {
	+	if (!a || !b)
	+		BUG("null fspathncmp arguments");
	 	return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
	 }

We could perhaps be more defensive in match_pathname(), but even if we
did so, it makes sense for this code to match the rest of the exclude
callers.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-16 12:47:51 -07:00
Jon Simons cf34337f98 list-objects-filter: give a more specific error sparse parsing error
The sparse:oid filter has two error modes: we might fail to resolve the
name to an OID, or we might fail to parse the contents of that OID. In
the latter case, let's give a less generic error message, and mention
the OID we did find.

While we're here, let's also mark both messages as translatable.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-16 12:47:45 -07:00
Jeff King 4c96a77594 list-objects-filter: delay parsing of sparse oid
The list-objects-filter code has two steps to its initialization:

  1. parse_list_objects_filter() makes sure the spec is a filter we know
     about and is syntactically correct. This step is done by "rev-list"
     or "upload-pack" that is going to apply a filter, but also by "git
     clone" or "git fetch" before they send the spec across the wire.

  2. list_objects_filter__init() runs the type-specific initialization
     (using function pointers established in step 1). This happens at
     the start of traverse_commit_list_filtered(), when we're about to
     actually use the filter.

It's a good idea to parse as much as we can in step 1, in order to catch
problems early (e.g., a blob size limit that isn't a number). But one
thing we _shouldn't_ do is resolve any oids at that step (e.g., for
sparse-file contents specified by oid). In the case of a fetch, the oid
has to be resolved on the remote side.

The current code does resolve the oid during the parse phase, but
ignores any error (which we must do, because we might just be sending
the spec across the wire). This leads to two bugs:

  - if we're not in a repository (e.g., because it's git-clone parsing
    the spec), then we trigger a BUG() trying to resolve the name

  - if we did hit the error case, we still have to notice that later and
    bail. The code path in rev-list handles this, but the one in
    upload-pack does not, leading to a segfault.

We can fix both by moving the oid resolution into the sparse-oid init
function. At that point we know we have a repository (because we're
about to traverse), and handling the error there fixes the segfault.

As a bonus, we can drop the NULL sparse_oid_value check in rev-list,
since this is now handled in the sparse-oid-filter init function.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-16 12:47:37 -07:00
Derrick Stolee 468ce99b77 unpack-trees: rename 'is_excluded_from_list()'
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

Now that this library is renamed to use 'struct pattern_list'
and 'struct pattern', we can now rename the method used by
the sparse-checkout feature to determine which paths should
appear in the working directory.

The method is_excluded_from_list() is only used by the
sparse-checkout logic in unpack-trees and list-objects-filter.
The confusing part is that it returned 1 for "excluded" (i.e.
it matches the list of exclusions) but that really manes that
the path matched the list of patterns for _inclusion_ in the
working directory.

Rename the method to be path_matches_pattern_list() and have
it return an explicit 'enum pattern_match_result'. Here, the
values MATCHED = 1, UNMATCHED = 0, and UNDECIDED = -1 agree
with the previous integer values. This shift allows future
consumers to better understand what the retur values mean,
and provides more type checking for handling those values.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05 14:05:12 -07:00
Derrick Stolee 65edd96aec treewide: rename 'exclude' methods to 'pattern'
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.

This commit renames several methods defined in dir.h to make
more sense with the renamed 'struct exclude_list' to 'struct
pattern_list' and 'struct exclude' to 'struct path_pattern':

 * last_exclude_matching() -> last_matching_pattern()
 * parse_exclude() -> parse_path_pattern()

In addition, the word 'exclude' was replaced with 'pattern'
in the methods below:

 * add_exclude_list()
 * add_excludes_from_file_to_list()
 * add_excludes_from_file()
 * add_excludes_from_blob_to_list()
 * add_exclude()
 * clear_exclude_list()

A few methods with the word "exclude" remain. These will
be handled seperately. In particular, the method
"is_excluded()" is concretely about the .gitignore file
relative to a specific directory. This is the important
boundary between library and consumer: is_excluded() cares
about .gitignore, but is_excluded() calls
last_matching_pattern() to make that decision.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05 14:05:12 -07:00
Derrick Stolee caa3d55444 treewide: rename 'struct exclude_list' to 'struct pattern_list'
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.

This commit renames 'struct exclude_list' to 'struct pattern_list'
and renames several variables called 'el' to 'pl'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05 14:05:11 -07:00
Matthew DeVore e987df5fe6 list-objects-filter: implement composite filters
Allow combining filters such that only objects accepted by all filters
are shown. The motivation for this is to allow getting directory
listings without also fetching blobs. This can be done by combining
blob:none with tree:<depth>. There are massive repositories that have
larger-than-expected trees - even if you include only a single commit.

A combined filter supports any number of subfilters, and is written in
the following form:

	combine:<filter 1>+<filter 2>+<filter 3>

Certain non-alphanumeric characters in each filter must be
URL-encoded.

For now, combined filters must be specified in this form. In a
subsequent commit, rev-list will support multiple --filter arguments
which will have the same effect as specifying one filter argument
starting with "combine:". The documentation will be updated in that
commit, as the URL-encoding scheme is in general not meant to be used
directly by the user, and it is better to describe the URL-encoding
feature in terms of the repeated flag.

Helped-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Jeff Hostetler <git@jeffhostetler.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Matthew DeVore 7a7c7f4a6d list-objects-filter: put omits set in filter struct
The oidset *omits pointer must be accessed by the combine filter in a
type-agnostic way once the graph traversal is over. Store that pointer
in the general `filter` struct. This will be used in a follow-up patch
to implement the combine filter.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Matthew DeVore 9430147ca0 list-objects-filter: encapsulate filter components
Encapsulate filter_fn, filter_free_fn, and filter_data into their own
opaque struct.

Due to opaqueness, filter_fn and filter_free_fn can no longer be
accessed directly by users. Currently, all usages of filter_fn are
guarded by a necessary check:

	(obj->flags & NOT_USER_GIVEN) && filter_fn

Take the opportunity to include this check into the new function
list_objects_filter__filter_object(), so that we no longer need to write
this check at every caller of the filter function.

Also, the init functions in list-objects-filter.c no longer need to
confusingly return the filter constituents in various places (filter_fn
and filter_free_fn as out parameters, and filter_data as the function's
return value); they can just initialize the "struct filter" passed in.

Helped-by: Jeff Hostetler <git@jeffhostetler.com>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Junio C Hamano 34032c4f8f Merge branch 'md/list-objects-filter-memfix'
The filter_data used in the list-objects-filter (which manages a
lazily sparse clone repository) did not use the dynamic array API
correctly---'nr' is supposed to point at one past the last element
of the array in use.  This has been corrected.

* md/list-objects-filter-memfix:
  list-objects-filter: correct usage of ALLOC_GROW
2019-06-21 11:24:09 -07:00
Matthew DeVore 7140600e2e list-objects-filter: correct usage of ALLOC_GROW
In the sparse filter data, array_frame array is used in a way such that
nr is the index of the last element. Fix this so that nr is actually the
number of elements in the array.

The filter_sparse_free function also has an unaddressed TODO to free the
memory associated with the sparse filter data. Address that TODO and fix
the memory leak.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-31 12:36:07 -07:00
Christian Couder e693237e2b list-objects-filter: disable 'sparse:path' filters
If someone wants to use as a filter a sparse file that is in the
repository, something like "--filter=sparse:oid=<ref>:<path>"
already works.

So 'sparse:path' is only interesting if the sparse file is not in
the repository. In this case though the current implementation has
a big security issue, as it makes it possible to ask the server to
read any file, like for example /etc/password, and to explore the
filesystem, as well as individual lines of files.

If someone is interested in using a sparse file that is not in the
repository as a filter, then at the minimum a config option, such
as "uploadpack.sparsePathFilter", should be implemented first to
restrict the directory from which the files specified by
'sparse:path' can be read.

For now though, let's just disable 'sparse:path' filters.

Helped-by: Matthew DeVore <matvore@google.com>
Helped-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-29 11:05:34 -07:00
Matthew DeVore 8272f26034 tree:<depth>: skip some trees even when collecting omits
If a tree has already been recorded as omitted, we don't need to
traverse it again just to collect its omits. Stop traversing trees a
second time when collecting omits.

Signed-off-by: Matthew DeVore <matvore@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15 15:39:34 -08:00
Matthew DeVore c813a7c35f list-objects-filter: teach tree:# how to handle >0
Implement positive values for <depth> in the tree:<depth> filter. The
exact semantics are described in Documentation/rev-list-options.txt.

The long-term goal at the end of this is to allow a partial clone to
eagerly fetch an entire directory of files by fetching a tree and
specifying <depth>=1. This, for instance, would make a build operation
fast and convenient. It is fast because the partial clone does not need
to fetch each file individually, and convenient because the user does
not need to supply a sparse-checkout specification.

Another way of considering this feature is as a way to reduce
round-trips, since the client can get any number of levels of
directories in a single request, rather than wait for each level of tree
objects to come back, whose entries are used to construct a new request.

Signed-off-by: Matthew DeVore <matvore@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15 15:39:34 -08:00
Nguyễn Thái Ngọc Duy 01d40c8487 list-objects-filter.c: remove implicit dependency on the_index
While at there, since we have access to struct repository now,
eliminate the only the_repository reference in this file.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12 14:50:05 +09:00
Matthew DeVore 8b10a206f0 list-objects: support for skipping tree traversal
The tree:0 filter does not need to traverse the trees that it has
filtered out, so optimize list-objects and list-objects-filter to skip
traversing the trees entirely. Before this patch, we iterated over all
children of the tree, and did nothing for all of them, which was
wasteful.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-18 12:49:18 +09:00
Matthew DeVore bc5975d24f list-objects-filter: implement filter tree:0
Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also considered only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07 08:55:00 +09:00
Matthew DeVore 696aa73905 list-objects-filter: use BUG rather than die
In some cases in this file, BUG makes more sense than die. In such
cases, a we get there from a coding error rather than a user error.

'return' has been removed following some instances of BUG since BUG does
not return.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07 08:55:00 +09:00
Junio C Hamano 00624d608c Merge branch 'sb/object-store-grafts'
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.

* sb/object-store-grafts:
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow prepare_commit_graft to handle arbitrary repositories
  shallow: migrate shallow information into the object parser
  path.c: migrate global git_path_* to take a repository argument
  cache: convert get_graft_file to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  shallow: add repository argument to is_repository_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to register_shallow
  shallow: add repository argument to set_alternate_shallow_file
  commit: add repository argument to lookup_commit_graft
  commit: add repository argument to prepare_commit_graft
  commit: add repository argument to read_graft_file
  commit: add repository argument to register_commit_graft
  commit: add repository argument to commit_graft_pos
  object: move grafts to object parser
  object-store: move object access functions to object-store.h
2018-07-18 12:20:28 -07:00
Junio C Hamano fcb6df3254 Merge branch 'sb/oid-object-info'
The codepath around object-info API has been taught to take the
repository object (which in turn tells the API which object store
the objects are to be located).

* sb/oid-object-info:
  cache.h: allow oid_object_info to handle arbitrary repositories
  packfile: add repository argument to cache_or_unpack_entry
  packfile: add repository argument to unpack_entry
  packfile: add repository argument to read_object
  packfile: add repository argument to packed_object_info
  packfile: add repository argument to packed_to_object_type
  packfile: add repository argument to retry_bad_packed_offset
  cache.h: add repository argument to oid_object_info
  cache.h: add repository argument to oid_object_info_extended
2018-05-23 14:38:16 +09:00
Stefan Beller cbd53a2193 object-store: move object access functions to object-store.h
This should make these functions easier to find and cache.h less
overwhelming to read.

In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file

As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise.  It would be better to #include wherever
identifiers from the header are used.  That can happen later
when we have better tooling for it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
Stefan Beller 0df8e96566 cache.h: add repository argument to oid_object_info
Add a repository argument to allow the callers of oid_object_info
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-26 10:54:27 +09:00
Stefan Beller e5e5e08832 sha1_name.c: rename to use dash in file name
This is more consistent with the project style. The majority of Git's
source files use dashes in preference to underscores in their file names.

Signed-off-by: Stefan Beller <sbeller@google.com>
2018-04-11 18:11:00 +09:00
brian m. carlson abef9020e3 sha1_file: convert sha1_object_info* to object_id
Convert sha1_object_info and sha1_object_info_extended to take pointers
to struct object_id and rename them to use "oid" instead of "sha1" in
their names.  Update the declaration and definition and apply the
following semantic patch, plus the standard object_id transforms:

@@
expression E1, E2;
@@
- sha1_object_info(E1.hash, E2)
+ oid_object_info(&E1, E2)

@@
expression E1, E2;
@@
- sha1_object_info(E1->hash, E2)
+ oid_object_info(E1, E2)

@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1.hash, E2, E3)
+ oid_object_info_extended(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1->hash, E2, E3)
+ oid_object_info_extended(E1, E2, E3)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
Jeff Hostetler 25ec7bcac0 list-objects: filter objects in traverse_commit_list
Create traverse_commit_list_filtered() and add filtering
interface to allow certain objects to be omitted from the
traversal.

Update traverse_commit_list() to be a wrapper for the above
with a null filter to minimize the number of callers that
needed to be changed.

Object filtering will be used in a future commit by rev-list
and pack-objects for partial clone and fetch to omit unwanted
objects from the result.

traverse_bitmap_commit_list() does not work with filtering.
If a packfile bitmap is present, it will not be used.  It
should be possible to extend such support in the future (at
least to simple filters that do not require object pathnames),
but that is beyond the scope of this patch series.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-22 14:11:57 +09:00