This is a 64-bit value, hence having it first provides a better
alignment.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some compilers produce errors when arithmetic is attempted on pointers to
void. We want computations done on byte addresses, so cast them to char *
to work them around.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In addition to X86, PowerPC and S390 are capable of unaligned memory
accesses.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is needed on architectures with poor or non-existent unaligned memory
support and/or no fast byte swap instruction (such as ARM) by using byte
accesses to memory and shifting the result together.
This also makes the code portable, therefore the byte access methods are
the defaults. Any architecture that properly supports unaligned word
accesses in hardware simply has to enable the alternative methods.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is to make it easier for them to be selected individually depending
on the architecture instead of the other way around i.e. having each
architecture select a list of hacks up front. That makes for clearer
documentation as well.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the code around so specific architecture hacks are defined first.
Also make one line comments actually one line. No code change.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For x86 performance (especially in 32-bit mode) I added that hack to write
the SHA1 internal temporary hash using a volatile pointer, in order to get
gcc to not try to cache the array contents. Because gcc will do all the
wrong things, and then spill things in insane random ways.
But on architectures like PPC, where you have 32 registers, it's actually
perfectly reasonable to put the whole temporary array[] into the register
set, and gcc can do so.
So make the 'volatile unsigned int *' cast be dependent on a
SMALL_REGISTER_SET preprocessor symbol, and enable it (currently) on just
x86 and x86-64. With that, the routine is fairly reasonable even when
compared to the hand-scheduled PPC version. Ben Herrenschmidt reports on
a G5:
* Paulus asm version: about 3.67s
* Yours with no change: about 5.74s
* Yours without "volatile": about 3.78s
so with this the C version is within about 3% of the asm one.
And add a lot of commentary on what the heck is going on.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I think I have found a way to avoid the gcc crazyness.
Lookie here:
# TIME[s] SPEED[MB/s]
rfc3174 5.094 119.8
rfc3174 5.098 119.7
linus 1.462 417.5
linusas 2.008 304
linusas2 1.878 325
mozilla 5.566 109.6
mozillaas 5.866 104.1
openssl 1.609 379.3
spelvin 1.675 364.5
spelvina 1.601 381.3
nettle 1.591 383.6
notice? I outperform all the hand-tuned asm on 32-bit too. By quite a
margin, in fact.
Now, I didn't try a P4, and it's possible that it won't do that there, but
the 32-bit code generation sure looks impressive on my Nehalem box. The
magic? I force the stores to the 512-bit hash bucket to be done in order.
That seems to help a lot.
The diff is trivial (on top of the "rename registers with cpp" patch), as
appended. And it does seem to fix the P4 issues too, although I can
obviously (once again) only test Prescott, and only in 64-bit mode:
# TIME[s] SPEED[MB/s]
rfc3174 1.662 36.73
rfc3174 1.64 37.22
linus 0.2523 241.9
linusas 0.4367 139.8
linusas2 0.4487 136
mozilla 0.9704 62.9
mozillaas 0.9399 64.94
that's some really impressive improvement. All from just saying "do the
stores in the order I told you to, dammit!" to the compiler.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of letting the compiler to figure out the optimal way to rotate
register usage, explicitly rotate the register names with cpp.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
.. and simplify the ctx->size logic.
We now count the size in bytes, which means that 'lenW' was always just
the low 6 bits of the total size, so we don't carry it around separately
any more. And we do the 'size in bits' shift at the end.
Suggested by Nicolas Pitre and linux@horizon.com.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's an equivalent expression, but the '+' gives us some freedom in
instruction selection (for example, we can use 'lea' rather than 'add'),
and associates with the other additions around it to give some minor
scheduling freedom.
Suggested-by: linux@horizon.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid repeating the shared parts of the different rounds by adding a
macro layer or two. It was already more cpp than C.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The mozilla-SHA1 code did this 80-word array for the 80 iterations. But
the SHA1 state is really just 512 bits, and you can actually keep it in
a kind of "circular queue" of just 16 words instead.
This requires us to do the xor updates as we go along (rather than as a
pre-phase), but that's really what we want to do anyway.
This gets me really close to the OpenSSL performance on my Nehalem.
Look ma, all C code (ok, there's the rol/ror hack, but that one doesn't
strictly even matter on my Nehalem, it's just a local optimization).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps a teeny bit. But what I -really- want to do is to avoid the
whole 80-array loop, and do the xor updates as I go along..
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Bert Wesarg noticed non-x86 version of SHA_ROT() had a typo.
Also spell in-line assembly as __asm__(), otherwise I seem to get
error: implicit declaration of function 'asm' from my compiler.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the one with the smaller constant. It _can_ generate slightly
smaller code (a constant of 1 is special), but perhaps more importantly
it's possibly faster on any uarch that does a rotate with a loop.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Undo the change I picked up from the mailing list discussion suggested
by Nico, not because it is wrong, but it will be done at the end of the
follow-up series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Based on the mozilla SHA1 routine, but doing the input data accesses a
word at a time and with 'htonl()' instead of loading bytes and shifting.
It requires an architecture that is ok with unaligned 32-bit loads and a
fast htonl().
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sb/parse-options:
prune-packed: migrate to parse-options
verify-pack: migrate to parse-options
verify-tag: migrate to parse-options
write-tree: migrate to parse-options
The option --merge was missing for submodule update and --cached for
submodule summary.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, the documentation suggests that 'git merge-base -a' and 'git
show-branch --merge-base' are equivalent (in fact it claims that the
former cannot handle more than two revs).
Alas, the handling of more than two revs is very different. Document
this by tests and correct the documentation to reflect this.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make sure that usage strings and documentation coincide with each other
and with the actual code.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
...so that it is easier to reuse it for other tests.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Importing the popen2 module in Python-2.6 results in the
"DeprecationWarning: The popen2 module is deprecated. Use the
subprocess module." message. The module itself isn't used in fact, so
just removing it solves the problem.
Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Equality between file_parent and file_name was being checked without a
preliminary check for existence of the parameters.
Fix by wrapping the equality check in appropriate if (defined ...),
rearranging the lines to prevent excessive length.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Acked-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With extra fixes from Thadeu and Carlos as well.
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Carlos R. Mafra <crmafra2@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current documentation states that servers typically listen on port
465 and calls this "ssmtp". While it's true that many mail servers use
port 465 for SSL smtp, this is non-standard, and hails from the days
before smtp and submission TLS support, that arrived in RFC2487 and
RFC3207. Port 465 is actually assigned by IANA for unrelated purposes,
and is mostly still used by mail servers today only to support Outlook
Express.
In any case, this patch helps the documentation better reflect both
standards and reality, while still helpfully mentioning ports numbers
that a user may wish to specify.
Signed-off-by: Wesley J. Landaker <wjl@icecavern.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current documentation confuses non-standard SSL smtp port 465 with
submission port 587 (RFC 4406). This patch just changes the referenced
number.
Signed-off-by: Wesley J. Landaker <wjl@icecavern.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Configuration values are expected to be quoted when they have leading or
trailing whitespace, but inner whitespace should be kept verbatim even if
the value is not quoted. This is already documented in git-config(1), but
the code caused inner whitespace to be collapsed to a single space,
breaking, for example, clones from a path that has two consecutive spaces
in it, as future fetches would only see a single space.
Reported-by: John te Bokkel <tanj.tanj@gmail.com>
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using git fast-export and git fast-import to rewrite the history
of a repository with large binary files, almost all of the time is
spent dealing with blobs. This is extremely inefficient if all we want
to do is rewrite the commits and tree structure. --no-data skips the
output of blobs and writes SHA-1s instead of marks, which provides a
massive speedup.
Signed-off-by: Geoffrey Irving <irving@naml.us>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is usually better to have positive options, to avoid confusing double
negations. However, sometimes it is desirable to show the negative option
in the help.
Introduce the flag PARSE_OPT_NEGHELP to do that.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow git request-pull to append diff body into the pull request.
It's useful for small series of commits.
Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* hv/cvsps-tests:
t/t9600: remove exit after test_done
cvsimport: extend testcase about patchset order to contain branches
cvsimport: add test illustrating a bug in cvsps
Add a test of "git cvsimport"'s handling of tags and branches
Add some tests of git-cvsimport's handling of vendor branches
Test contents of entire cvsimported "master" tree contents
Use CVS's -f option if available (ignore user's ~/.cvsrc file)
Start a library for cvsimport-related tests
The problem is that if a file was replaced with a directory containing
another file with the same content and mode, an attempt to merge it
with a branch descended from a commit before this F->D transition will
cause merge-recursive to break. It breaks even if there were no
conflicting changes on that other branch.
Originally reported by Anders Melchiorsen.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The next major release will be 1.6.5, hopefully with a shorter cycle
than the 1.6.4 cycle. After that in 1.7.0 we can make potentially
backward incompatible changes if necessary.
Signed-off-by: Junio C Hamano <gitster@pobox.com>