"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.
* ds/name-hash-tweaks:
pack-objects: prevent name hash version change
test-tool: add helper for name-hash values
p5313: add size comparison test
pack-objects: add GIT_TEST_NAME_HASH_VERSION
repack: add --name-hash-version option
pack-objects: add --name-hash-version option
pack-objects: create new name-hash function version
The API around choosing to use unsafe variant of SHA-1
implementation has been updated in an attempt to make it harder to
abuse.
* tb/unsafe-hash-cleanup:
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
Introduce a new API to visit objects in batches based on a common
path, or by type.
* ds/path-walk-1:
path-walk: drop redundant parse_tree() call
path-walk: reorder object visits
path-walk: mark trees and blobs as UNINTERESTING
path-walk: visit tags and cached objects
path-walk: allow consumer to specify object types
t6601: add helper for testing path-walk API
test-lib-functions: add test_cmp_sorted
path-walk: introduce an object walk by path
Add a new test-tool helper, name-hash, to output the value of the
name-hash algorithms for the input list of strings, one per line.
Since the name-hash values can be stored in the .bitmap files, it is
important that these hash functions do not change across Git versions.
Add a simple test to t5310-pack-bitmaps.sh to provide some testing of
the current values. Due to how these functions are implemented, it would
be difficult to change them without disturbing these values. The paths
used for this test are carefully selected to demonstrate some of the
behavior differences of the two current name hash versions, including
which conditions will cause them to collide.
Create a performance test that uses test_size to demonstrate how
collisions occur for these hash algorithms. This test helps inform
someone as to the behavior of the name-hash algorithms for their repo
based on the paths at HEAD.
My copy of the Git repository shows modest statistics around the
collisions of the default name-hash algorithm:
Test this tree
--------------------------------------------------
5314.1: paths at head 4.5K
5314.2: distinct hash value: v1 4.1K
5314.3: maximum multiplicity: v1 13
5314.4: distinct hash value: v2 4.2K
5314.5: maximum multiplicity: v2 9
Here, the maximum collision multiplicity is 13, but around 10% of paths
have a collision with another path.
In a more interesting example, the microsoft/fluentui [1] repo had these
statistics at time of committing:
Test this tree
--------------------------------------------------
5314.1: paths at head 19.5K
5314.2: distinct hash value: v1 8.2K
5314.3: maximum multiplicity: v1 279
5314.4: distinct hash value: v2 17.8K
5314.5: maximum multiplicity: v2 44
[1] https://github.com/microsoft/fluentui
That demonstrates that of the nearly twenty thousand path names, they
are assigned around eight thousand distinct values. 279 paths are
assigned to a single value, leading the packing algorithm to sort
objects from those paths together, by size.
With the v2 name hash function, the maximum multiplicity lowers to 44,
leaving some room for further improvement.
In a more extreme example, an internal monorepo had a much worse
collision rate:
Test this tree
--------------------------------------------------
5314.1: paths at head 227.3K
5314.2: distinct hash value: v1 72.3K
5314.3: maximum multiplicity: v1 14.4K
5314.4: distinct hash value: v2 166.5K
5314.5: maximum multiplicity: v2 138
Here, we can see that the v2 name hash function provides somem
improvements, but there are still a number of collisions that could lead
to repacking problems at this scale.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the new "unsafe" SHA-1 build knob, it is convenient to have a
test-tool that can exercise Git's unsafe SHA-1 wrappers for testing,
similar to 't/helper/test-tool sha1'.
Implement that helper by altering the implementation of that test-tool
(in cmd_hash_impl(), which is generic and parameterized over different
hash functions) to conditionally run the unsafe variants of the chosen
hash function, and expose the new behavior via a new 'sha1-unsafe' test
helper.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add some tests based on the current behavior, doing interesting checks
for different sets of branches, ranges, and the --boundary option. This
sets a baseline for the behavior and we can extend it as new options are
introduced.
Store and output a 'batch_nr' value so we can demonstrate that the paths are
grouped together in a batch and not following some other ordering. This
allows us to test the depth-first behavior of the path-walk API. However, we
purposefully do not test the order of the objects in the batch, so the
output is compared to the expected output through a sort.
It is important to mention that the behavior of the API will change soon as
we start to handle UNINTERESTING objects differently, but these tests will
demonstrate the change in behavior.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a bunch of loops which iterate up to an unsigned boundary using
a signed index, which generates warnigs because we compare a signed and
unsigned value in the loop condition. Address these sites for trivial
cases and enable `-Wsign-compare` warnings for these code units.
This patch only adapts those code units where we can drop the
`DISABLE_SIGN_COMPARE_WARNINGS` macro in the same step.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Another reftable test migrated to the unit-test framework.
* cp/unit-test-reftable-stack:
t-reftable-stack: add test for stack iterators
t-reftable-stack: add test for non-default compaction factor
t-reftable-stack: use reftable_ref_record_equal() to compare ref records
t-reftable-stack: use Git's tempfile API instead of mkstemp()
t: harmonize t-reftable-stack.c with coding guidelines
t: move reftable/stack_test.c to the unit testing framework
reftable/stack_test.c exercises the functions defined in
reftable/stack.{c, h}. Migrate reftable/stack_test.c to the
unit testing framework. Migration involves refactoring the tests
to use the unit testing framework instead of reftable's test
framework and renaming the tests to be in-line with unit-tests'
standards.
Since some of the tests use set_test_hash() defined by
reftable/test_framework.{c, h} but these files are not
'#included' in the test file, copy this function in the
ported test file.
With the migration of stack test to the unit-tests framework,
"test-tool reftable" becomes a no-op. Hence, get rid of everything
that uses "test-tool reftable" alongside everything that is used
to implement it.
While at it, alphabetically sort the cmds[] list in
helper/test-tool.c by moving the entry for "dump-reftable".
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
helper/test-oid-array.c along with t0064-oid-array.sh test the
oid-array.h API, which provides storage and processing
efficiency over large lists of object identifiers.
Migrate them to the unit testing framework for better runtime
performance and efficiency. As we don't initialize a repository
in these tests, the hash algo that functions like oid_array_lookup()
use is not initialized, therefore call repo_set_hash_algo() to
initialize it. And init_hash_algo():lib-oid.c can aid in this
process, so make it public.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
helper/test-urlmatch-normalization along with
t0110-urlmatch-normalization test the `url_normalize()` function from
'urlmatch.h'. Migrate them to the unit testing framework for better
performance. And also add different test_msg()s for better debugging.
In the migration, last two of the checks from `t_url_general_escape()`
were slightly changed compared to the shell script. This involves
changing
'\'' -> '
'\!' -> !
in the urls of those checks. This is because in C strings, we don't
need to escape "'" and "!". Other than these two, all the urls were
pasted verbatim from the shell script.
Another change is the removal of a MINGW prerequisite from one of the
test. It was there because[1] on Windows, the command line is a
Unicode string, it is not possible to pass arbitrary bytes to a
program. But in unit tests we don't have this limitation.
And since we can construct strings with arbitrary bytes in C, let's
also remove the test files which contain URLs with arbitrary bytes in
the 't/t0110' directory and instead embed those URLs in the unit test
code itself.
[1]: https://lore.kernel.org/git/53CAC8EF.6020707@gmail.com/
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
helper/test-oidmap.c along with t0016-oidmap.sh test the oidmap.h
library which is built on top of hashmap.h.
Migrate them to the unit testing framework for better performance,
concise code and better debugging. Along with the migration also plug
memory leaks and make the test logic independent for all the tests.
The migration removes 'put' tests from t0016, because it is used as
setup to all the other tests, so testing it separately does not yield
any benefit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Reviewed-by: Josh Steadmon <steadmon@google.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A test helper that essentially is unit tests on the "decorate"
logic has been rewritten using the unit-tests framework.
* gt/decorate-unit-test:
t/: migrate helper/test-example-decorate to the unit testing framework
helper/test-oidtree.c along with t0069-oidtree.sh test the oidtree.h
library, which is a wrapper around crit-bit tree. Migrate them to
the unit testing framework for better debugging and runtime
performance. Along with the migration, add an extra check for
oidtree_each() test, which showcases how multiple expected matches can
be given to check_each() helper.
To achieve this, introduce a new library called 'lib-oid.h'
exclusively for the unit tests to use. It currently mainly includes
utility to generate object_id from an arbitrary hex string
(i.e. '12a' -> '12a0000000000000000000000000000000000000'). This also
handles the hash algo selection based on GIT_TEST_DEFAULT_HASH.
This library will also be helpful when we port other unit tests such
as oid-array, oidset etc.
Helped-by: Junio C Hamano <gitster@pobox.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
[jc: small fixlets squashed in]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
helper/test-example-decorate.c along with t9004-example.sh provide
an example of how to use the functions in decorate.h (which provides
a data structure that associates Git objects to void pointers) and
also test their output.
Migrate them to the new unit testing framework for better debugging
and runtime performance.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the recent codebase update (8bf6fbd (Merge branch
'js/doc-unit-tests', 2023-12-09)), a new unit testing framework was
merged, providing a standardized approach for testing C code. Prior to
this update, some unit tests relied on the test helper mechanism,
lacking a dedicated unit testing framework. It's more natural to perform
these unit tests using the new unit test framework.
Let's migrate the unit tests for strcmp-offset functionality from the
legacy approach using the test-tool command `test-tool strcmp-offset` in
helper/test-strcmp-offset.c to the new unit testing framework
(t/unit-tests/test-lib.h).
The migration involves refactoring the tests to utilize the testing
macros provided by the framework (TEST() and check_*()).
Helped-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-authored-by: Achu Luma <ach.lumap@gmail.com>
Signed-off-by: Achu Luma <ach.lumap@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "test-tool" has been taught to run testsuite tests in parallel,
bypassing the need to use the "prove" tool.
* js/unit-test-suite-runner:
cmake: let `test-tool` run the unit tests, too
ci: use test-tool as unit test runner on Windows
t/Makefile: run unit tests alongside shell tests
unit tests: add rule for running with test-tool
test-tool run-command testsuite: support unit tests
test-tool run-command testsuite: remove hardcoded filter
test-tool run-command testsuite: get shell from env
t0080: turn t-basic unit test into a helper
While t/unit-tests/t-basic.c uses the unit-test framework added in
e137fe3b29 (unit tests: add TAP unit test framework, 2023-11-09), it is
not a true unit test in that it intentionally fails in order to exercise
various codepaths in the unit-test framework. Thus, we intentionally
exclude it when running unit tests through the various t/Makefile
targets. Instead, it is executed by t0080-unit-test-output.sh, which
verifies its output follows the TAP format expected for the various
pass, skip, or fail cases.
As such, it makes more sense for t-basic to be a helper item for
t0080-unit-test-output.sh, so let's move it to
t/helper/test-example-tap.c and adjust Makefiles as necessary.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.
* eb/hash-transition: (30 commits)
t1016-compatObjectFormat: add tests to verify the conversion between objects
t1006: test oid compatibility with cat-file
t1006: rename sha1 to oid
test-lib: compute the compatibility hash so tests may use it
builtin/ls-tree: let the oid determine the output algorithm
object-file: handle compat objects in check_object_signature
tree-walk: init_tree_desc take an oid to get the hash algorithm
builtin/cat-file: let the oid determine the output algorithm
rev-parse: add an --output-object-format parameter
repository: implement extensions.compatObjectFormat
object-file: update object_info_extended to reencode objects
object-file-convert: convert commits that embed signed tags
object-file-convert: convert commit objects when writing
object-file-convert: don't leak when converting tag objects
object-file-convert: convert tag objects when writing
object-file-convert: add a function to convert trees between algorithms
object: factor out parse_mode out of fast-import and tree-walk into in object.h
cache: add a function to read an OID of a specific algorithm
tag: sign both hashes
commit: export add_header_signature to support handling signatures on tags
...
The priority queue test has been migrated to the unit testing
framework.
* cp/unit-test-prio-queue:
tests: move t0009-prio-queue.sh to the new unit testing framework
t/t0009-prio-queue.sh along with t/helper/test-prio-queue.c unit
tests Git's implementation of a priority queue. Migrate the
test over to the new unit testing framework to simplify debugging
and reduce test run-time. Refactor the required logic and add
a new test case in addition to porting over the original ones in
shell.
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the recent codebase update (8bf6fbd00d (Merge branch
'js/doc-unit-tests', 2023-12-09)), a new unit testing framework was
merged, providing a standardized approach for testing C code. Prior to
this update, some unit tests relied on the test helper mechanism,
lacking a dedicated unit testing framework. It's more natural to perform
these unit tests using the new unit test framework.
This commit migrates the unit tests for C character classification
functions (isdigit(), isspace(), etc) from the legacy approach
using the test-tool command `test-tool ctype` in t/helper/test-ctype.c
to the new unit testing framework (t/unit-tests/test-lib.h).
The migration involves refactoring the tests to utilize the testing
macros provided by the framework (TEST() and check_*()).
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Achu Luma <ach.lumap@gmail.com>
Acked-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For now, this is just a rename from `t/helper/test-fast-rebase.c` into
`builtin/replay.c` with minimal changes to make it build appropriately.
Let's add a stub documentation and a stub test script though.
Subsequent commits will flesh out the capabilities of the new command
and make it a more standard regular builtin.
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Co-authored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The index file has room only for lower 32-bit of the file size in
the cached stat information, which means cached stat information
will have 0 in its sd_size member for a file whose size is multiple
of 4GiB. This is mistaken for a racily clean path. Avoid it by
storing a bogus sd_size value instead for such files.
* bc/racy-4gb-files:
Prevent git from rehashing 4GiB files
t: add a test helper to truncate files
In a future commit, we're going to work with some large files which will
be at least 4 GiB in size. To take advantage of the sparseness
functionality on most Unix systems and avoid running the system out of
disk, it would be convenient to use truncate(2) to simply create a
sparse file of sufficient size.
However, the GNU truncate(1) utility isn't portable, so let's write a
tiny test helper that does the work for us.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For now my strategy is simple. Create two identical repositories one
in each format. Use fixed timestamps. Verify the dynamically computed
compatibility objects from one repository match the objects stored in
the other repository.
A general limitation of this strategy is that the git when generating
signed tags and commits with compatObjectFormat enabled will generate
a signature for both formats. To overcome this limitation I have
added "test-tool delete-gpgsig" that when fed an signed commit or tag
with two signatures deletes one of the signatures.
With that in place I can have "git commit" and "git tag" generate
signed objects, have my tool delete one, and feed the new object
into "git hash-object" to create the kinds of commits and tags
git without compatObjectFormat enabled will generate.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a following commit, we will make it possible to separate objects in
different packfiles depending on a filter.
To make sure that the right objects are in the right packs, let's add a
new test-tool that can display which packfile(s) a given object is in.
Let's also make it possible to check if a given object is in the
expected number of packfiles with a `--check-count <n>` option.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As "git update-index --show-index-version" can do the same thing,
the 'index-version' subcommand in the test-tool lost its reason to
exist. Remove it and replace its use with the end-user facing
'git update-index --show-index-version'.
Helped-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since [1] there has been no reason for keeping "git env--helper" a
built-in. The reason it was a built-in to begin with was to support
the GIT_TEST_GETTEXT_POISON mode removed in that commit. I.e. unlike
the rest of "test-tool" it would potentially be called by the
installed git via "git-sh-i18n.sh".
As none of that applies since [1] we should stop carrying this
technical debt, and move it to t/helper/*. As this mostly move-only
change shows this has the nice bonus that we'll stop wasting time
translating the internal-only strings it emits.
Even though this was a built-in, it was intentionally never
documented, see its introduction in [2]. It never saw use outside of
the test suite, except for the "GIT_TEST_GETTEXT_POISON" use-case
noted above.
1. d162b25f95 (tests: remove support for GIT_TEST_GETTEXT_POISON,
2021-01-20)
2. b4f207f339 (env--helper: new undocumented builtin wrapping
git_env_*(), 2019-06-21)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid calling 'cache_tree_update()' when doing so would be redundant.
* vd/skip-cache-tree-update:
rebase: use 'skip_cache_tree_update' option
read-tree: use 'skip_cache_tree_update' option
reset: use 'skip_cache_tree_update' option
unpack-trees: add 'skip_cache_tree_update' option
cache-tree: add perf test comparing update and prime
Add a performance test comparing the execution times of 'prime_cache_tree()'
and 'cache_tree_update(_, WRITE_TREE_SILENT | WRITE_TREE_REPAIR)'. The goal
of comparing these two is to identify which is the faster method for
rebuilding an invalid cache tree, ultimately to remove one when both are
(reundantly) called in immediate succession.
Both methods are fast, so the new tests in 'p0090-cache-tree.sh' must call
each tested function multiple times to ensure the reported times (to 0.01s
resolution) convey the differences between them.
The tests compare the timing of a 'test-tool cache-tree' run as a no-op (to
capture a baseline for the overhead associated with running the tool),
'cache_tree_update()', and 'prime_cache_tree()' on four scenarios:
- A completely valid cache tree
- A cache tree with 2 invalid paths
- A cache tree with 50 invalid paths
- A completely empty cache tree
Example results:
Test this tree
-----------------------------------------------------------
0090.2: no-op, clean 1.27(0.48+0.52)
0090.3: prime_cache_tree, clean 2.02(0.83+0.85)
0090.4: cache_tree_update, clean 1.30(0.49+0.54)
0090.5: no-op, invalidate 2 1.29(0.48+0.54)
0090.6: prime_cache_tree, invalidate 2 1.98(0.81+0.83)
0090.7: cache_tree_update, invalidate 2 2.12(0.94+0.86)
0090.8: no-op, invalidate 50 1.32(0.50+0.55)
0090.9: prime_cache_tree, invalidate 50 2.10(0.86+0.89)
0090.10: cache_tree_update, invalidate 50 2.35(1.14+0.90)
0090.11: no-op, empty 1.33(0.50+0.54)
0090.12: prime_cache_tree, empty 2.04(0.84+0.87)
0090.13: cache_tree_update, empty 2.51(1.27+0.92)
These timings show that, while 'cache_tree_update()' is faster when the
cache tree is completely valid, it is equal to or slower than
'prime_cache_tree()' when there are any invalid paths. Since the redundant
calls are mostly in scenarios where the cache tree will be at least
partially invalid (e.g., 'git reset --hard'), 'prime_cache_tree()' will
likely perform better than 'cache_tree_update()' in typical cases.
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Address the root cause of technical debt we've been carrying since
sha1collisiondetection was made the default in [1]. In a preceding
commit we narrowly fixed a bug where the "DC_SHA1" variable would be
unset (in combination with "NO_APPLE_COMMON_CRYPTO=" on OSX), even
though we had the sha1collisiondetection library enabled.
But the only reason we needed to have such a user-exposed knob went
away with [1], and it's been doing nothing useful since then. We don't
care if you define DC_SHA1=*, we only care that you don't ask for any
other SHA-1 implementation. If it turns out that you didn't, we'll use
sha1collisiondetection, whether you had "DC_SHA1" set or not.
As a result of this being confusing we had e.g. [2] for cmake and the
recent [3] for ci/lib.sh setting "DC_SHA1" explicitly, even though
this was always a NOOP.
A much simpler way to do this is to stop having the Makefile and
CMakeLists.txt set "DC_SHA1" to be picked up by the test-lib.sh, let's
instead add a trivial "test-tool sha1-is-sha1dc". It returns zero if
we're using sha1collisiondetection, non-zero otherwise.
1. e6b07da278 (Makefile: make DC_SHA1 the default, 2017-03-17)
2. c4b2f41b5f (cmake: support for testing git with ctest, 2020-06-26)
3. 1ad5c3df35 (ci: use DC_SHA1=YesPlease on osx-clang job for CI,
2022-10-20)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Define the logical elements of a "bundle list", data structure to
store them in-core, format to transfer them, and code to parse
them.
* ds/bundle-uri-3:
bundle-uri: suppress stderr from remote-https
bundle-uri: quiet failed unbundlings
bundle: add flags to verify_bundle()
bundle-uri: fetch a list of bundles
bundle: properly clear all revision flags
bundle-uri: limit recursion depth for bundle lists
bundle-uri: parse bundle list in config format
bundle-uri: unit test "key=value" parsing
bundle-uri: create "key=value" line parsing
bundle-uri: create base key-value pair parsing
bundle-uri: create bundle_list struct and helpers
bundle-uri: use plain string in find_temp_filename()
Create a new 'test-tool bundle-uri' test helper. This helper will assist
in testing logic deep in the bundle URI feature.
This change introduces the 'parse-key-values' subcommand, which parses
an input file as a list of lines. These are fed into
bundle_uri_parse_line() to test how we construct a 'struct bundle_list'
from that data. The list is then output to stdout as if the key-value
pairs were a Git config file.
We use an input file instead of stdin because of a future change to
parse in config-file format that works better as an input file.
Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new "test-tool submodule" and move the "is-active" subcommand
over to it. It was added in 5c2bd8b77a (submodule--helper: add
is-active subcommand, 2017-03-16), since
a452128a36 (submodule--helper: introduce add-config subcommand,
2021-08-06) it hasn't been used by git-submodule.sh.
Since we're creating a command dispatch similar to test-tool.c itself
let's split out the "struct test_cmd" into a new test-tool-utils.h,
which both this new code and test-tool.c itself can use.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce the "subcommand" mode to parse-options API and update the
command line parser of Git commands with subcommands.
* sg/parse-options-subcommand: (23 commits)
remote: run "remote rm" argv through parse_options()
maintenance: add parse-options boilerplate for subcommands
pass subcommand "prefix" arguments to parse_options()
builtin/worktree.c: let parse-options parse subcommands
builtin/stash.c: let parse-options parse subcommands
builtin/sparse-checkout.c: let parse-options parse subcommands
builtin/remote.c: let parse-options parse subcommands
builtin/reflog.c: let parse-options parse subcommands
builtin/notes.c: let parse-options parse subcommands
builtin/multi-pack-index.c: let parse-options parse subcommands
builtin/hook.c: let parse-options parse subcommands
builtin/gc.c: let parse-options parse 'git maintenance's subcommands
builtin/commit-graph.c: let parse-options parse subcommands
builtin/bundle.c: let parse-options parse subcommands
parse-options: add support for parsing subcommands
parse-options: drop leading space from '--git-completion-helper' output
parse-options: clarify the limitations of PARSE_OPT_NODASH
parse-options: PARSE_OPT_KEEP_UNKNOWN only applies to --options
api-parse-options.txt: fix description of OPT_CMDMODE
t0040-parse-options: test parse_options() with various 'parse_opt_flags'
...
Several Git commands have subcommands to implement mutually exclusive
"operation modes", and they usually parse their subcommand argument
with a bunch of if-else if statements.
Teach parse-options to handle subcommands as well, which will result
in shorter and simpler code with consistent error handling and error
messages on unknown or missing subcommand, and it will also make
possible for our Bash completion script to handle subcommands
programmatically.
The approach is guided by the following observations:
- Most subcommands [1] are implemented in dedicated functions, and
most of those functions [2] either have a signature matching the
'int cmd_foo(int argc, const char **argc, const char *prefix)'
signature of builtin commands or can be trivially converted to
that signature, because they miss only that last prefix parameter
or have no parameters at all.
- Subcommand arguments only have long form, and they have no double
dash prefix, no negated form, and no description, and they don't
take any arguments, and can't be abbreviated.
- There must be exactly one subcommand among the arguments, or zero
if the command has a default operation mode.
- All arguments following the subcommand are considered to be
arguments of the subcommand, and, conversely, arguments meant for
the subcommand may not preceed the subcommand.
So in the end subcommand declaration and parsing would look something
like this:
parse_opt_subcommand_fn *fn = NULL;
struct option builtin_commit_graph_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir, N_("dir"),
N_("the object directory to store the graph")),
OPT_SUBCOMMAND("verify", &fn, graph_verify),
OPT_SUBCOMMAND("write", &fn, graph_write),
OPT_END(),
};
argc = parse_options(argc, argv, prefix, options,
builtin_commit_graph_usage, 0);
return fn(argc, argv, prefix);
Here each OPT_SUBCOMMAND specifies the name of the subcommand and the
function implementing it, and the address of the same 'fn' subcommand
function pointer. parse_options() then processes the arguments until
it finds the first argument matching one of the subcommands, sets 'fn'
to the function associated with that subcommand, and returns, leaving
the rest of the arguments unprocessed. If none of the listed
subcommands is found among the arguments, parse_options() will show
usage and abort.
If a command has a default operation mode, 'fn' should be initialized
to the function implementing that mode, and parse_options() should be
invoked with the PARSE_OPT_SUBCOMMAND_OPTIONAL flag. In this case
parse_options() won't error out when not finding any subcommands, but
will return leaving 'fn' unchanged. Note that if that default
operation mode has any --options, then the PARSE_OPT_KEEP_UNKNOWN_OPT
flag is necessary as well (otherwise parse_options() would error out
upon seeing the unknown option meant to the default operation mode).
Some thoughts about the implementation:
- The same pointer to 'fn' must be specified as 'value' for each
OPT_SUBCOMMAND, because there can be only one set of mutually
exclusive subcommands; parse_options() will BUG() otherwise.
There are other ways to tell parse_options() where to put the
function associated with the subcommand given on the command line,
but I didn't like them:
- Change parse_options()'s signature by adding a pointer to
subcommand function to be set to the function associated with
the given subcommand, affecting all callsites, even those that
don't have subcommands.
- Introduce a specific parse_options_and_subcommand() variant
with that extra funcion parameter.
- I decided against automatically calling the subcommand function
from within parse_options(), because:
- There are commands that have to perform additional actions
after option parsing but before calling the function
implementing the specified subcommand.
- The return code of the subcommand is usually the return code
of the git command, but preserving the return code of the
automatically called subcommand function would have made the
API awkward.
- Also add a OPT_SUBCOMMAND_F() variant to allow specifying an
option flag: we have two subcommands that are purposefully
excluded from completion ('git remote rm' and 'git stash save'),
so they'll have to be specified with the PARSE_OPT_NOCOMPLETE
flag.
- Some of the 'parse_opt_flags' don't make sense with subcommands,
and using them is probably just an oversight or misunderstanding.
Therefore parse_options() will BUG() when invoked with any of the
following flags while the options array contains at least one
OPT_SUBCOMMAND:
- PARSE_OPT_KEEP_DASHDASH: parse_options() stops parsing
arguments when encountering a "--" argument, so it doesn't
make sense to expect and keep one before a subcommand, because
it would prevent the parsing of the subcommand.
However, this flag is allowed in combination with the
PARSE_OPT_SUBCOMMAND_OPTIONAL flag, because the double dash
might be meaningful for the command's default operation mode,
e.g. to disambiguate refs and pathspecs.
- PARSE_OPT_STOP_AT_NON_OPTION: As its name suggests, this flag
tells parse_options() to stop as soon as it encouners a
non-option argument, but subcommands are by definition not
options... so how could they be parsed, then?!
- PARSE_OPT_KEEP_UNKNOWN: This flag can be used to collect any
unknown --options and then pass them to a different command or
subsystem. Surely if a command has subcommands, then this
functionality should rather be delegated to one of those
subcommands, and not performed by the command itself.
However, this flag is allowed in combination with the
PARSE_OPT_SUBCOMMAND_OPTIONAL flag, making possible to pass
--options to the default operation mode.
- If the command with subcommands has a default operation mode, then
all arguments to the command must preceed the arguments of the
subcommand.
AFAICT we don't have any commands where this makes a difference,
because in those commands either only the command accepts any
arguments ('notes' and 'remote'), or only the default subcommand
('reflog' and 'stash'), but never both.
- The 'argv' array passed to subcommand functions currently starts
with the name of the subcommand. Keep this behavior. AFAICT no
subcommand functions depend on the actual content of 'argv[0]',
but the parse_options() call handling their options expects that
the options start at argv[1].
- To support handling subcommands programmatically in our Bash
completion script, 'git cmd --git-completion-helper' will now list
both subcommands and regular --options, if any. This means that
the completion script will have to separate subcommands (i.e.
words without a double dash prefix) from --options on its own, but
that's rather easy to do, and it's not much work either, because
the number of subcommands a command might have is rather low, and
those commands accept only a single --option or none at all. An
alternative would be to introduce a separate option that lists
only subcommands, but then the completion script would need not
one but two git invocations and command substitutions for commands
with subcommands.
Note that this change doesn't affect the behavior of our Bash
completion script, because when completing the --option of a
command with subcommands, e.g. for 'git notes --<TAB>', then all
subcommands will be filtered out anyway, as none of them will
match the word to be completed starting with that double dash
prefix.
[1] Except 'git rerere', because many of its subcommands are
implemented in the bodies of the if-else if statements parsing the
command's subcommand argument.
[2] Except 'credential', 'credential-store' and 'fsmonitor--daemon',
because some of the functions implementing their subcommands take
special parameters.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 't0040-parse-options.sh' we thoroughly test the parsing of all
types and forms of options, but in all those tests parse_options() is
always invoked with a 0 flags parameter.
Add a few tests to demonstrate how various 'enum parse_opt_flags'
values are supposed to influence option parsing.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This script is currently used by three test files: t0021-conversion.sh,
t2080-parallel-checkout-basics.sh, and
t2082-parallel-checkout-attributes.sh. To avoid the need for the PERL
dependency at these tests, let's convert the script to a C test-tool
command. The following commit will take care of actually modifying the
said tests to use the new C helper and removing the Perl script.
The Perl script flushes the log file handler after each write. As
commented in [1], this seems to be an early design decision that was
later reconsidered, but possibly ended up being left in the code by
accident:
>> +$debug->flush();
>
> Isn't $debug flushed automatically?
Maybe, but autoflush is not explicitly enabled. I will
enable it again (I disabled it because of Eric's comment
but I re-read the comment and he is only talking about
pipes).
Anyways, this behavior is not really needed for the tests and the
flush() calls make the code slightly larger, so let's avoid them
altogether in the new C version.
[1]: https://lore.kernel.org/git/7F1F1A0E-8FC3-4FBD-81AA-37786DE0EF50@gmail.com/
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More fsmonitor--daemon.
* jh/builtin-fsmonitor-part3: (30 commits)
t7527: improve implicit shutdown testing in fsmonitor--daemon
fsmonitor--daemon: allow --super-prefix argument
t7527: test Unicode NFC/NFD handling on MacOS
t/lib-unicode-nfc-nfd: helper prereqs for testing unicode nfc/nfd
t/helper/hexdump: add helper to print hexdump of stdin
fsmonitor: on macOS also emit NFC spelling for NFD pathname
t7527: test FSMonitor on case insensitive+preserving file system
fsmonitor: never set CE_FSMONITOR_VALID on submodules
t/perf/p7527: add perf test for builtin FSMonitor
t7527: FSMonitor tests for directory moves
fsmonitor: optimize processing of directory events
fsm-listen-darwin: shutdown daemon if worktree root is moved/renamed
fsm-health-win32: force shutdown daemon if worktree root moves
fsm-health-win32: add polling framework to monitor daemon health
fsmonitor--daemon: stub in health thread
fsmonitor--daemon: rename listener thread related variables
fsmonitor--daemon: prepare for adding health thread
fsmonitor--daemon: cd out of worktree root
fsm-listen-darwin: ignore FSEvents caused by xattr changes on macOS
unpack-trees: initialize fsmonitor_has_run_once in o->result
...
Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next patch, we will implement and test support for writing a
cruft pack via a special mode of `git pack-objects`. To make sure that
objects are written with the correct timestamps, and a new test-tool
that can dump the object names and corresponding timestamps from a given
`.mtimes` file.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create an IPC client to send query and flush commands to the daemon.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are many situations in which having access to a cryptographically
secure pseudorandom number generator (CSPRNG) is helpful. In the
future, we'll encounter one of these when dealing with temporary files.
To make this possible, let's add a function which reads from a system
CSPRNG and returns some bytes.
We know that all systems will have such an interface. A CSPRNG is
required for a secure TLS or SSH implementation and a Git implementation
which provided neither would be of little practical use. In addition,
POSIX is set to standardize getentropy(2) in the next version, so in the
(potentially distant) future we can rely on that.
For systems which lack one of the other interfaces, we provide the
ability to use OpenSSL's CSPRNG. OpenSSL is highly portable and
functions on practically every known OS, and we know it will have access
to some source of cryptographically secure randomness. We also provide
support for the arc4random in libbsd for folks who would prefer to use
that.
Because this is a security sensitive interface, we take some
precautions. We either succeed by filling the buffer completely as we
requested, or we fail. We don't return partial data because the caller
will almost never find that to be a useful behavior.
Specify a makefile knob which users can use to specify one or more
suitable CSPRNGs, and turn the multiple string options into a set of
defines, since we cannot match on strings in the preprocessor. We allow
multiple options to make the job of handling this in autoconf easier.
The order of options is important here. On systems with arc4random,
which is most of the BSDs, we use that, since, except on MirBSD and
macOS, it uses ChaCha20, which is extremely fast, and sits entirely in
userspace, avoiding a system call. We then prefer getrandom over
getentropy, because the former has been available longer on Linux, and
then OpenSSL. Finally, if none of those are available, we use
/dev/urandom, because most Unix-like operating systems provide that API.
We prefer options that don't involve device files when possible because
those work in some restricted environments where device files may not be
available.
Set the configuration variables appropriately for Linux and the BSDs,
including macOS, as well as Windows and NonStop. We specifically only
consider versions which receive publicly available security support
here. For the same reason, we don't specify getrandom(2) on Linux,
because CentOS 7 doesn't support it in glibc (although its kernel does)
and we don't want to resort to making syscalls.
Finally, add a test helper to allow this to be tested by hand and in
tests. We don't add any tests, since invoking the CSPRNG is not likely
to produce interesting, reproducible results.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This command dumps individual tables or a stack of of tables.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit provides basic utility classes for the reftable library.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>