This is fewer lines of code, but more importantly, fixes a
bogus pointer offset. We are looking for "tar." in the
section, but later assume that the dot we found is at offset
9, not 3. This is a holdover from an earlier iteration of
767cf45 which called the section "tarfilter".
As a result, we could erroneously reject some filters with
dots in their name, as well as read uninitialized memory.
Reported by (and test by) René Scharfe.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The name field of a tar header has a size of 100 characters. This limit
was extended long ago in a backward compatible way by providing the
additional prefix field, which can hold 155 additional characters. The
actual path is constructed at extraction time by concatenating the prefix
field, a slash and the name field.
get_path_prefix() is used to determine which slash in the path is used as
the cutting point and thus which part of it is placed into the field
prefix and which into the field name. It tries to cram as much into the
prefix field as possible. (And only if we can't fit a path into the
provided 255 characters we use a pax extended header to store it.)
If a path is longer than 100 but shorter than 156 characters and ends
with a slash (i.e. is for a directory) then get_path_prefix() puts the
whole path in the prefix field and leaves the name field empty. GNU tar
reconstructs the path without complaint, but the tar included with
NetBSD 6 does not: It reports the header to be invalid.
For compatibility with this version of tar, make sure to never leave the
name field empty. In order to do that, trim the trailing slash from the
part considered as possible prefix, if it exists -- that way the last
path component (or more, but not less) will end up in the name field.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
POSIX.1 (pax) is pretty clear on this:
The chksum field shall be the ISO/IEC 646:1991 standard IRV
representation of the octal value of the simple sum of all octets
in the header logical record. Each octet in the header shall be
treated as an unsigned value. These values shall be added to an
unsigned integer, initialized to zero, the precision of which is
not less than 17 bits. When calculating the checksum, the chksum
field is treated as if it were all <space> characters.
so is GNU:
http://www.gnu.org/software/tar/manual/html_node/Checksumming.html
Found by 7zip folks and reported by Rafał Mużyło.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For correctness, don't needlessly drop the const qualifier when casting.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t5000 verifies output while t1050 makes sure the command always
respects core.bigfilethreshold
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
archive-tar.c and archive-zip.c now perform conversion check, with
help of sha1_file_to_archive() from archive.c
This gives backends more freedom in dealing with (streaming) large
blobs.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's used to be
if (!sha1) {
...
} else if (!path) {
...
} else {
...
}
Now that the first two blocks are no-op. We can remove the if/else
skeleton and put the else block back by one indent level.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before this patch write_tar_entry() can:
- write global header
by write_global_extended_header() calling write_tar_entry with
with both sha1 and path == NULL
- write extended header for symlinks, by write_tar_entry() calling
itself with sha1 != NULL and path == NULL
- write a normal blob. In this case both sha1 and path are valid.
After this patch, the first two call sites are modified to write the
header without calling write_tar_entry(). The function is now for
writing blobs only. This simplifies handling when write_tar_entry()
learns about large blobs.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tar filters may be very expensive to run, so sites do
not want to expose them via upload-archive. This patch lets
users configure tar.<filter>.remote to turn them off.
By default, gzip filters are left on, as they are about as
expensive as creating zip archives.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This works exactly as if the user had configured it via:
[tar "tgz"]
command = gzip -cn
[tar "tar.gz"]
command = gzip -cn
but since it is so common, it's convenient to have it
builtin without the user needing to do anything.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's common to pipe the tar output produce by "git archive"
through gzip or some other compressor. Locally, this can
easily be done by using a shell pipe. When requesting a
remote archive, though, it cannot be done through the
upload-archive interface.
This patch allows configurable tar filters, so that one
could define a "tar.gz" format that automatically pipes tar
output through gzip.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current archivers are very static; when you are in the
write_tar_archive function, you know you are writing a tar.
However, to facilitate runtime-configurable archivers
that will share a common write function we need to tell the
function which archiver was used.
As a convenience, we also provide an opaque data pointer in
the archiver struct so that individual archivers can put
something useful there when they register themselves.
Technically they could just use the "name" field to look in
an internal map of names to data, but this is much simpler.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the tar and zip code was nicely split out into two
abstracted files which knew only about their specific
formats. The entry point to this code was a single "write
archive" function.
However, as these basic formats grow more complex (e.g., by
handling multiple file extensions and format names), a
static list of the entry point functions won't be enough.
Instead, let's provide a way for the tar and zip code to
tell the main archive code what they support by registering
archiver names and functions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We load our own tar-specific config, and then chain to
git_default_config. This is pointless, as our caller should
already have loaded the default config. It also introduces a
needless inconsistency with the zip archiver, which does not
look at the config files at all (and therefore relies on the
caller to have loaded config).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On some systems, giving a value of type time_t to printf "%lo" that
expects an unsigned long would give a type mismatch warning.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many call sites use strbuf_init(&foo, 0) to initialize local
strbuf variable "foo" which has not been accessed since its
declaration. These can be replaced with a static initialization
using the STRBUF_INIT macro which is just as readable, saves a
function call, and takes up fewer lines.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Add the exported function write_archive_entries() to archive.c, which uses
the new ability of read_tree_recursive() to pass a context pointer to its
callback in order to centralize previously duplicated code.
The new callback function write_archive_entry() does the work that every
archiver backend needs to do: loading file contents, entering subdirectories,
handling file attributes, constructing the full path of the entry. All that
done, it calls the backend specific write_archive_entry_fn_t function.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Calculate the length of base and save it in a new member of struct
archiver_args. This way we don't have to compute it in each of the
format backends.
Note: parse_archive_args() guarantees that ->base won't ever be NULL.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a pointer parameter to read_tree_recursive(), which is passed to the
callback function. This allows callers of read_tree_recursive() to
share data with the callback without resorting to global variables. All
current callers pass NULL.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Paths marked with this attribute are not output to git-archive
output.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git_config() only had a function parameter, but no callback data
parameter. This assumes that all callback functions only modify
global variables.
With this patch, every callback gets a void * parameter, and it is hoped
that this will help the libification effort.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ulrik Sverdrup noticed that git-archive doesn't correctly apply the attribute
export-subst when the option --prefix is given, too.
When it checked if a file has the attribute turned on, git-archive would try
to look up the full path -- including the prefix -- in .gitattributes. That's
wrong, as the prefix doesn't need to have any relation to any existing
directories, tracked or not.
This patch makes git-archive ignore the prefix when looking up if value of the
attribute export-subst for a file.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Add strbuf_rtrim to remove trailing spaces.
* Add strbuf_insert to insert data at a given position.
* Off-by one fix in strbuf_addf: strbuf_avail() does not counts the final
\0 so the overflow test for snprintf is the strict comparison. This is
not critical as the growth mechanism chosen will always allocate _more_
memory than asked, so the second test will not fail. It's some kind of
miracle though.
* Add size extension hints for strbuf_init and strbuf_read. If 0, default
applies, else:
+ initial buffer has the given size for strbuf_init.
+ first growth checks it has at least this size rather than the
default 8192.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is just cleaner way to deal with strbufs, using its API rather than
reinventing it in the module (e.g. strbuf_append_string is just the plain
strbuf_addstr function, and it was used to perform what strbuf_addch does
anyways).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The gory details are explained in strbuf.h. The change of semantics this
patch enforces is that the embeded buffer has always a '\0' character after
its last byte, to always make it a C-string. The offs-by-one changes are all
related to that very change.
A strbuf can be used to store byte arrays, or as an extended string
library. The `buf' member can be passed to any C legacy string function,
because strbuf operations always ensure there is a terminating \0 at the end
of the buffer, not accounted in the `len' field of the structure.
A strbuf can be used to generate a string/buffer whose final size is not
really known, and then "strbuf_detach" can be used to get the built buffer,
and keep the wrapping "strbuf" structure usable for further work again.
Other interesting feature: strbuf_grow(sb, size) ensure that there is
enough allocated space in `sb' to put `size' new octets of data in the
buffer. It helps avoiding reallocating data for nothing when the problem the
strbuf helps to solve has a known typical size.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add support for a new attribute, specfile. Files marked as being
specfiles are expanded by git-archive when they are written to an
archive. It has no effect on worktree files. The same placeholders
as those for the option --pretty=format: of git-log et al. can be
used.
The attribute is useful for creating auto-updating specfiles. It is
limited by the underlying function format_commit_message(), though.
E.g. currently there is no placeholder for git-describe like output,
and expanded specfiles can't contain NUL bytes. That can be fixed
in format_commit_message() later and will then benefit users of
git-log, too.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted by Johan Herland, git-archive is a kind of checkout and needs
to apply any checkout filters that might be configured.
This patch adds the convenience function convert_sha1_file which returns
a buffer containing the object's contents, after converting, if necessary
(i.e. it's a combination of read_sha1_file and convert_to_working_tree).
Direct calls to read_sha1_file in git-archive are then replaced by calls
to convert_sha1_file.
Since convert_sha1_file expects its path argument to be NUL-terminated --
a convention it inherits from convert_to_working_tree -- the patch also
changes the path handling in archive-tar.c to always NUL-terminate the
string. It used to solely rely on the len field of struct strbuf before.
archive-zip.c already NUL-terminates the path and thus needs no such
change.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Both archive-tar and archive-zip needed to be taught about subprojects.
The tar function died when trying to read the subproject commit object,
while the zip function reported "unsupported file mode".
This fixes both by representing the subproject as an empty directory.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value. One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.
This is an initial step for the removal of the version using a char array
found in object reading code paths. The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
In order to make the generated tar files more friendly to users who
extract them as root using GNU tar and its implied -p option, change
the default umask to 002 and change the owner name and group name to
root. This ensures that a) the extracted files and directories are
not world-writable and b) that they belong to user and group root.
Before they would have been assigned to a user and/or group named
git if it existed. This also answers the question in the removed
comment: uid=0, gid=0, uname=root, gname=root is exactly what we
want.
Normal users who let tar apply their umask while extracting are
only affected if their umask allowed the world to change their
files (e.g. a umask of zero). This case is so unlikely and strange
that we don't need to support it.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a mechanical clean-up of the way *.c files include
system header files.
(1) sources under compat/, platform sha-1 implementations, and
xdelta code are exempt from the following rules;
(2) the first #include must be "git-compat-util.h" or one of
our own header file that includes it first (e.g. config.h,
builtin.h, pkt-line.h);
(3) system headers that are included in "git-compat-util.h"
need not be included in individual C source files.
(4) "git-compat-util.h" does not have to include subsystem
specific header files (e.g. expat.h).
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch doesn't change any functionality, it only moves code around. It
makes seeing the few remaining lines of git-tar-tree code easier. ;-)
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
generate_tar() eventually calls write_tar_archive() which does all the
"real" work and which also calls git_config(git_tar_config). We only
need to do this once.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch removes the custom tree walker tree_traverse(), and makes
generate_tar() use write_tar_archive() and the infrastructure provided
by git-archive instead.
As a kind of side effect, make write_tar_archive() able to handle NULL
as base directory, as this is what the new and simple generate_tar()
uses to indicate the absence of a base directory. This was simpler
and cleaner than playing tricks with empty strings.
The behaviour of git-tar-tree should be unchanged (quick tests didn't
indicate otherwise) except for the text of some error messages.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
(cherry picked from 5d2aea4cb383a43e40d47ab69d8ad7a495df6ea2 commit)
This is based on Rene Scharfe's earlier patch, but uses the
archiver support introduced by the previous patch.
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Acked-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Like xmalloc and xrealloc xstrdup dies with a useful message if
the native strdup() implementation returns NULL rather than a
valid pointer.
I just tried to use xstrdup in new code and found it to be missing.
However I expected it to be present as xmalloc and xrealloc are
already commonly used throughout the code.
[jc: removed the part that deals with last_XXX, which I am
finding more and more dubious these days.]
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
builtin-tar-tree.c::git_tar_config() and http-push.c::add_one_object()
are not used outside their own files.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The little helper write_or_die() won't come back with bad news about
full disks or broken pipes. It either succeeds or terminates the
program, making additional error handling unnecessary.
This patch adds the new function and uses it to replace two similar
ones (the one in tar-tree originally has been copied from cat-file
btw.). I chose to add the fd parameter which both lacked to make
write_or_die() just as flexible as write() and thus suitable for
lib-ification.
There is a regression: error messages emitted by this function don't
show the program name, while the replaced two functions did. That's
acceptable, I think; a lot of other functions do the same.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
In the name of Standardization, this cleanses the last usage string of
mystical creatures. But they still dwell deep within the source and in
some debug messages, it is said.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Free the root tree object buffer when we're done, plugging a minor leak
in generate_tar(). Note: we cannot simply free(tree.buf) because this
pointer is modified by tree_entry() calls in traverse_tree().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This changes the calling convention of built-in commands and
passes the "prefix" (i.e. pathname of $PWD relative to the
project root level) down to them.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
By default, git-tar-tree(1) sets file and directories modes to 0666
or 0777. While this is both useful and acceptable for projects such
as the Linux Kernel, it might be excessive for other projects. With
this variable, it becomes possible to tell git-tar-tree(1) to apply
a specific umask to the modes above. The special value "user"
indicates that the user's current umask will be used. This should be
enough for most projects, as it will lead to the same permissions as
git-checkout(1) would use. The default value remains 0, which means
world read-write.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Acked-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This cleans up the use of safe_strncpy() even more. Since it has the
same semantics as strlcpy() use this name instead. Also move the
definition from inside path.c to its own file compat/strlcpy.c, and use
it conditionally at compile time, since some platforms already has
strlcpy(). It's included in the same way as compat/setenv.c.
Signed-off-by: Peter Eriksen <s022018@student.dtu.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>