272 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Plaintext
		
	
	
			
		
		
	
	
			272 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Plaintext
		
	
	
| Tweaking diff output
 | |
| ====================
 | |
| June 2005
 | |
| 
 | |
| 
 | |
| Introduction
 | |
| ------------
 | |
| 
 | |
| The diff commands git-diff-index, git-diff-files, and git-diff-tree
 | |
| can be told to manipulate differences they find in
 | |
| unconventional ways before showing diff(1) output.  The manipulation
 | |
| is collectively called "diffcore transformation".  This short note
 | |
| describes what they are and how to use them to produce diff outputs
 | |
| that are easier to understand than the conventional kind.
 | |
| 
 | |
| 
 | |
| The chain of operation
 | |
| ----------------------
 | |
| 
 | |
| The git-diff-* family works by first comparing two sets of
 | |
| files:
 | |
| 
 | |
|  - git-diff-index compares contents of a "tree" object and the
 | |
|    working directory (when '\--cached' flag is not used) or a
 | |
|    "tree" object and the index file (when '\--cached' flag is
 | |
|    used);
 | |
| 
 | |
|  - git-diff-files compares contents of the index file and the
 | |
|    working directory;
 | |
| 
 | |
|  - git-diff-tree compares contents of two "tree" objects;
 | |
| 
 | |
| In all of these cases, the commands themselves compare
 | |
| corresponding paths in the two sets of files.  The result of
 | |
| comparison is passed from these commands to what is internally
 | |
| called "diffcore", in a format similar to what is output when
 | |
| the -p option is not used.  E.g.
 | |
| 
 | |
| ------------------------------------------------
 | |
| in-place edit  :100644 100644 bcd1234... 0123456... M file0
 | |
| create         :000000 100644 0000000... 1234567... A file4
 | |
| delete         :100644 000000 1234567... 0000000... D file5
 | |
| unmerged       :000000 000000 0000000... 0000000... U file6
 | |
| ------------------------------------------------
 | |
| 
 | |
| The diffcore mechanism is fed a list of such comparison results
 | |
| (each of which is called "filepair", although at this point each
 | |
| of them talks about a single file), and transforms such a list
 | |
| into another list.  There are currently 6 such transformations:
 | |
| 
 | |
| - diffcore-pathspec
 | |
| - diffcore-break
 | |
| - diffcore-rename
 | |
| - diffcore-merge-broken
 | |
| - diffcore-pickaxe
 | |
| - diffcore-order
 | |
| 
 | |
| These are applied in sequence.  The set of filepairs git-diff-\*
 | |
| commands find are used as the input to diffcore-pathspec, and
 | |
| the output from diffcore-pathspec is used as the input to the
 | |
| next transformation.  The final result is then passed to the
 | |
| output routine and generates either diff-raw format (see Output
 | |
| format sections of the manual for git-diff-\* commands) or
 | |
| diff-patch format.
 | |
| 
 | |
| 
 | |
| diffcore-pathspec: For Ignoring Files Outside Our Consideration
 | |
| ---------------------------------------------------------------
 | |
| 
 | |
| The first transformation in the chain is diffcore-pathspec, and
 | |
| is controlled by giving the pathname parameters to the
 | |
| git-diff-* commands on the command line.  The pathspec is used
 | |
| to limit the world diff operates in.  It removes the filepairs
 | |
| outside the specified set of pathnames.  E.g. If the input set
 | |
| of filepairs included:
 | |
| 
 | |
| ------------------------------------------------
 | |
| :100644 100644 bcd1234... 0123456... M junkfile
 | |
| ------------------------------------------------
 | |
| 
 | |
| but the command invocation was "git-diff-files myfile", then the
 | |
| junkfile entry would be removed from the list because only "myfile"
 | |
| is under consideration.
 | |
| 
 | |
| Implementation note.  For performance reasons, git-diff-tree
 | |
| uses the pathname parameters on the command line to cull set of
 | |
| filepairs it feeds the diffcore mechanism itself, and does not
 | |
| use diffcore-pathspec, but the end result is the same.
 | |
| 
 | |
| 
 | |
| diffcore-break: For Splitting Up "Complete Rewrites"
 | |
| ----------------------------------------------------
 | |
| 
 | |
| The second transformation in the chain is diffcore-break, and is
 | |
| controlled by the -B option to the git-diff-* commands.  This is
 | |
| used to detect a filepair that represents "complete rewrite" and
 | |
| break such filepair into two filepairs that represent delete and
 | |
| create.  E.g.  If the input contained this filepair:
 | |
| 
 | |
| ------------------------------------------------
 | |
| :100644 100644 bcd1234... 0123456... M file0
 | |
| ------------------------------------------------
 | |
| 
 | |
| and if it detects that the file "file0" is completely rewritten,
 | |
| it changes it to:
 | |
| 
 | |
| ------------------------------------------------
 | |
| :100644 000000 bcd1234... 0000000... D file0
 | |
| :000000 100644 0000000... 0123456... A file0
 | |
| ------------------------------------------------
 | |
| 
 | |
| For the purpose of breaking a filepair, diffcore-break examines
 | |
| the extent of changes between the contents of the files before
 | |
| and after modification (i.e. the contents that have "bcd1234..."
 | |
| and "0123456..." as their SHA1 content ID, in the above
 | |
| example).  The amount of deletion of original contents and
 | |
| insertion of new material are added together, and if it exceeds
 | |
| the "break score", the filepair is broken into two.  The break
 | |
| score defaults to 50% of the size of the smaller of the original
 | |
| and the result (i.e. if the edit shrinks the file, the size of
 | |
| the result is used; if the edit lengthens the file, the size of
 | |
| the original is used), and can be customized by giving a number
 | |
| after "-B" option (e.g. "-B75" to tell it to use 75%).
 | |
| 
 | |
| 
 | |
| diffcore-rename: For Detection Renames and Copies
 | |
| -------------------------------------------------
 | |
| 
 | |
| This transformation is used to detect renames and copies, and is
 | |
| controlled by the -M option (to detect renames) and the -C option
 | |
| (to detect copies as well) to the git-diff-* commands.  If the
 | |
| input contained these filepairs:
 | |
| 
 | |
| ------------------------------------------------
 | |
| :100644 000000 0123456... 0000000... D fileX
 | |
| :000000 100644 0000000... 0123456... A file0
 | |
| ------------------------------------------------
 | |
| 
 | |
| and the contents of the deleted file fileX is similar enough to
 | |
| the contents of the created file file0, then rename detection
 | |
| merges these filepairs and creates:
 | |
| 
 | |
| ------------------------------------------------
 | |
| :100644 100644 0123456... 0123456... R100 fileX file0
 | |
| ------------------------------------------------
 | |
| 
 | |
| When the "-C" option is used, the original contents of modified files,
 | |
| and deleted files (and also unmodified files, if the
 | |
| "\--find-copies-harder" option is used) are considered as candidates
 | |
| of the source files in rename/copy operation.  If the input were like
 | |
| these filepairs, that talk about a modified file fileY and a newly
 | |
| created file file0:
 | |
| 
 | |
| ------------------------------------------------
 | |
| :100644 100644 0123456... 1234567... M fileY
 | |
| :000000 100644 0000000... bcd3456... A file0
 | |
| ------------------------------------------------
 | |
| 
 | |
| the original contents of fileY and the resulting contents of
 | |
| file0 are compared, and if they are similar enough, they are
 | |
| changed to:
 | |
| 
 | |
| ------------------------------------------------
 | |
| :100644 100644 0123456... 1234567... M fileY
 | |
| :100644 100644 0123456... bcd3456... C100 fileY file0
 | |
| ------------------------------------------------
 | |
| 
 | |
| In both rename and copy detection, the same "extent of changes"
 | |
| algorithm used in diffcore-break is used to determine if two
 | |
| files are "similar enough", and can be customized to use
 | |
| a similarity score different from the default of 50% by giving a
 | |
| number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
 | |
| 8/10 = 80%).
 | |
| 
 | |
| Note.  When the "-C" option is used with `\--find-copies-harder`
 | |
| option, git-diff-\* commands feed unmodified filepairs to
 | |
| diffcore mechanism as well as modified ones.  This lets the copy
 | |
| detector consider unmodified files as copy source candidates at
 | |
| the expense of making it slower.  Without `\--find-copies-harder`,
 | |
| git-diff-\* commands can detect copies only if the file that was
 | |
| copied happened to have been modified in the same changeset.
 | |
| 
 | |
| 
 | |
| diffcore-merge-broken: For Putting "Complete Rewrites" Back Together
 | |
| --------------------------------------------------------------------
 | |
| 
 | |
| This transformation is used to merge filepairs broken by
 | |
| diffcore-break, and not transformed into rename/copy by
 | |
| diffcore-rename, back into a single modification.  This always
 | |
| runs when diffcore-break is used.
 | |
| 
 | |
| For the purpose of merging broken filepairs back, it uses a
 | |
| different "extent of changes" computation from the ones used by
 | |
| diffcore-break and diffcore-rename.  It counts only the deletion
 | |
| from the original, and does not count insertion.  If you removed
 | |
| only 10 lines from a 100-line document, even if you added 910
 | |
| new lines to make a new 1000-line document, you did not do a
 | |
| complete rewrite.  diffcore-break breaks such a case in order to
 | |
| help diffcore-rename to consider such filepairs as candidate of
 | |
| rename/copy detection, but if filepairs broken that way were not
 | |
| matched with other filepairs to create rename/copy, then this
 | |
| transformation merges them back into the original
 | |
| "modification".
 | |
| 
 | |
| The "extent of changes" parameter can be tweaked from the
 | |
| default 80% (that is, unless more than 80% of the original
 | |
| material is deleted, the broken pairs are merged back into a
 | |
| single modification) by giving a second number to -B option,
 | |
| like these:
 | |
| 
 | |
| * -B50/60 (give 50% "break score" to diffcore-break, use 60%
 | |
|   for diffcore-merge-broken).
 | |
| 
 | |
| * -B/60 (the same as above, since diffcore-break defaults to 50%).
 | |
| 
 | |
| Note that earlier implementation left a broken pair as a separate
 | |
| creation and deletion patches.  This was an unnecessary hack and
 | |
| the latest implementation always merges all the broken pairs
 | |
| back into modifications, but the resulting patch output is
 | |
| formatted differently for easier review in case of such
 | |
| a complete rewrite by showing the entire contents of old version
 | |
| prefixed with '-', followed by the entire contents of new
 | |
| version prefixed with '+'.
 | |
| 
 | |
| 
 | |
| diffcore-pickaxe: For Detecting Addition/Deletion of Specified String
 | |
| ---------------------------------------------------------------------
 | |
| 
 | |
| This transformation is used to find filepairs that represent
 | |
| changes that touch a specified string, and is controlled by the
 | |
| -S option and the `\--pickaxe-all` option to the git-diff-*
 | |
| commands.
 | |
| 
 | |
| When diffcore-pickaxe is in use, it checks if there are
 | |
| filepairs whose "original" side has the specified string and
 | |
| whose "result" side does not.  Such a filepair represents "the
 | |
| string appeared in this changeset".  It also checks for the
 | |
| opposite case that loses the specified string.
 | |
| 
 | |
| When `\--pickaxe-all` is not in effect, diffcore-pickaxe leaves
 | |
| only such filepairs that touch the specified string in its
 | |
| output.  When `\--pickaxe-all` is used, diffcore-pickaxe leaves all
 | |
| filepairs intact if there is such a filepair, or makes the
 | |
| output empty otherwise.  The latter behaviour is designed to
 | |
| make reviewing of the changes in the context of the whole
 | |
| changeset easier.
 | |
| 
 | |
| 
 | |
| diffcore-order: For Sorting the Output Based on Filenames
 | |
| ---------------------------------------------------------
 | |
| 
 | |
| This is used to reorder the filepairs according to the user's
 | |
| (or project's) taste, and is controlled by the -O option to the
 | |
| git-diff-* commands.
 | |
| 
 | |
| This takes a text file each of whose lines is a shell glob
 | |
| pattern.  Filepairs that match a glob pattern on an earlier line
 | |
| in the file are output before ones that match a later line, and
 | |
| filepairs that do not match any glob pattern are output last.
 | |
| 
 | |
| As an example, a typical orderfile for the core git probably
 | |
| would look like this:
 | |
| 
 | |
| ------------------------------------------------
 | |
| README
 | |
| Makefile
 | |
| Documentation
 | |
| *.h
 | |
| *.c
 | |
| t
 | |
| ------------------------------------------------
 |