Browse Source
The diff-delta code can exhibit O(m*n) behavior with some patological data set where most hash entries end up in the same hash bucket. The latest code rework reduced the block size making it particularly vulnerable to this issue, but the issue was always there and can be triggered regardless of the block size. This patch does two things: 1) the hashing has been reworked to offer a better distribution to atenuate the problem a bit, and 2) a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a bit more expensive on average, but the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>maint
Nicolas Pitre
19 years ago
committed by
Junio C Hamano
1 changed files with 56 additions and 13 deletions
Loading…
Reference in new issue