You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
255 lines
10 KiB
255 lines
10 KiB
From 4a44a54b3caf77923f0e3f1d5bdf5eda6ef07f62 Mon Sep 17 00:00:00 2001 |
|
From: Chris MacGregor <chrismacgregor@google.com> |
|
Date: Thu, 27 Feb 2014 10:40:59 -0800 |
|
Subject: [PATCH] hwclock: fix possible hang and other |
|
set_hardware_clock_exact() issues |
|
|
|
In sys-utils/hwclock.c, set_hardware_clock_exact() has some problems when the |
|
process gets pre-empted (for more than 100ms) before reaching the time for |
|
which it waits: |
|
|
|
1. The "continue" statement causes execution to skip the final tdiff |
|
assignment at the end of the do...while loop, leading to the while condition |
|
using the wrong value of tdiff, and thus always exiting the loop once |
|
newhwtime != sethwtime (e.g., after 1 second). This masks bug # 2, below. |
|
|
|
2. The previously-existing bug is that because it starts over waiting for the |
|
desired time whenever two successive calls to gettimeofday() return values > |
|
100ms apart, the loop will never terminate unless the process holds the CPU |
|
(without losing it for more than 100ms) for at least 500ms. This can happen |
|
on a heavily loaded machine or on a virtual machine (or on a heavily loaded |
|
virtual machine). This has been observed to occur, preventing a machine from |
|
completing the shutdown or reboot process due to a "hwclock --systohc" call in |
|
a shutdown script. |
|
|
|
The new implementation presented in this patch takes a somewhat different |
|
approach, intended to accomplish the same goals: |
|
|
|
It computes the desired target system time (at which the requested hardware |
|
clock time will be applied to the hardware clock), and waits for that time to |
|
arrive. If it misses the time (such as due to being pre-empted for too long), |
|
it recalculates the target time, and increases the tolerance (how late it can |
|
be relative to the target time, and still be "close enough". Thus, if all is |
|
well, the time will be set *very* precisely. On a machine where the hwclock |
|
process is repeatedly pre-empted, it will set the time as precisely as is |
|
possible under the conditions present on that particular machine. In any |
|
case, it will always terminate eventually (and pretty quickly); it will never |
|
hang forever. |
|
|
|
[kzak@redhat.com: - tiny coding style changes] |
|
|
|
Signed-off-by: Chris MacGregor <chrismacgregor@google.com> |
|
Signed-off-by: Karel Zak <kzak@redhat.com> |
|
--- |
|
sys-utils/hwclock.c | 170 ++++++++++++++++++++++++++++++++++++++++------------ |
|
1 file changed, 131 insertions(+), 39 deletions(-) |
|
|
|
diff --git a/sys-utils/hwclock.c b/sys-utils/hwclock.c |
|
index 30660d4..395b5c3 100644 |
|
--- a/sys-utils/hwclock.c |
|
+++ b/sys-utils/hwclock.c |
|
@@ -125,7 +125,7 @@ struct adjtime { |
|
* We are running in debug mode, wherein we put a lot of information about |
|
* what we're doing to standard output. |
|
*/ |
|
-bool debug; |
|
+int debug; |
|
|
|
/* Workaround for Award 4.50g BIOS bug: keep the year in a file. */ |
|
bool badyear; |
|
@@ -526,43 +526,141 @@ set_hardware_clock_exact(const time_t sethwtime, |
|
const struct timeval refsystime, |
|
const bool universal, const bool testing) |
|
{ |
|
- time_t newhwtime = sethwtime; |
|
- struct timeval beginsystime, nowsystime; |
|
- double tdiff; |
|
- int time_resync = 1; |
|
- |
|
/* |
|
- * Now delay some more until Hardware Clock time newhwtime arrives. |
|
- * The 0.5 s is because the Hardware Clock always sets to your set |
|
- * time plus 500 ms (because it is designed to update to the next |
|
- * second precisely 500 ms after you finish the setting). |
|
+ * The Hardware Clock can only be set to any integer time plus one |
|
+ * half second. The integer time is required because there is no |
|
+ * interface to set or get a fractional second. The additional half |
|
+ * second is because the Hardware Clock updates to the following |
|
+ * second precisely 500 ms (not 1 second!) after you release the |
|
+ * divider reset (after setting the new time) - see description of |
|
+ * DV2, DV1, DV0 in Register A in the MC146818A data sheet (and note |
|
+ * that although that document doesn't say so, real-world code seems |
|
+ * to expect that the SET bit in Register B functions the same way). |
|
+ * That means that, e.g., when you set the clock to 1:02:03, it |
|
+ * effectively really sets it to 1:02:03.5, because it will update to |
|
+ * 1:02:04 only half a second later. Our caller passes the desired |
|
+ * integer Hardware Clock time in sethwtime, and the corresponding |
|
+ * system time (which may have a fractional part, and which may or may |
|
+ * not be the same!) in refsystime. In an ideal situation, we would |
|
+ * then apply sethwtime to the Hardware Clock at refsystime+500ms, so |
|
+ * that when the Hardware Clock ticks forward to sethwtime+1s half a |
|
+ * second later at refsystime+1000ms, everything is in sync. So we |
|
+ * spin, waiting for gettimeofday() to return a time at or after that |
|
+ * time (refsystime+500ms) up to a tolerance value, initially 1ms. If |
|
+ * we miss that time due to being preempted for some other process, |
|
+ * then we increase the margin a little bit (initially 1ms, doubling |
|
+ * each time), add 1 second (or more, if needed to get a time that is |
|
+ * in the future) to both the time for which we are waiting and the |
|
+ * time that we will apply to the Hardware Clock, and start waiting |
|
+ * again. |
|
+ * |
|
+ * For example, the caller requests that we set the Hardware Clock to |
|
+ * 1:02:03, with reference time (current system time) = 6:07:08.250. |
|
+ * We want the Hardware Clock to update to 1:02:04 at 6:07:09.250 on |
|
+ * the system clock, and the first such update will occur 0.500 |
|
+ * seconds after we write to the Hardware Clock, so we spin until the |
|
+ * system clock reads 6:07:08.750. If we get there, great, but let's |
|
+ * imagine the system is so heavily loaded that our process is |
|
+ * preempted and by the time we get to run again, the system clock |
|
+ * reads 6:07:11.990. We now want to wait until the next xx:xx:xx.750 |
|
+ * time, which is 6:07:12.750 (4.5 seconds after the reference time), |
|
+ * at which point we will set the Hardware Clock to 1:02:07 (4 seconds |
|
+ * after the originally requested time). If we do that successfully, |
|
+ * then at 6:07:13.250 (5 seconds after the reference time), the |
|
+ * Hardware Clock will update to 1:02:08 (5 seconds after the |
|
+ * originally requested time), and all is well thereafter. |
|
*/ |
|
- do { |
|
- if (time_resync) { |
|
- gettimeofday(&beginsystime, NULL); |
|
- tdiff = time_diff(beginsystime, refsystime); |
|
- newhwtime = sethwtime + (int)(tdiff + 0.5); |
|
- if (debug) |
|
- printf(_ |
|
- ("Time elapsed since reference time has been %.6f seconds.\n" |
|
- "Delaying further to reach the new time.\n"), |
|
- tdiff); |
|
- time_resync = 0; |
|
+ |
|
+ time_t newhwtime = sethwtime; |
|
+ double target_time_tolerance_secs = 0.001; /* initial value */ |
|
+ double tolerance_incr_secs = 0.001; /* initial value */ |
|
+ const double RTC_SET_DELAY_SECS = 0.5; /* 500 ms */ |
|
+ const struct timeval RTC_SET_DELAY_TV = { 0, RTC_SET_DELAY_SECS * 1E6 }; |
|
+ |
|
+ struct timeval targetsystime; |
|
+ struct timeval nowsystime; |
|
+ struct timeval prevsystime = refsystime; |
|
+ double deltavstarget; |
|
+ |
|
+ timeradd(&refsystime, &RTC_SET_DELAY_TV, &targetsystime); |
|
+ |
|
+ while (1) { |
|
+ double ticksize; |
|
+ |
|
+ /* FOR TESTING ONLY: inject random delays of up to 1000ms */ |
|
+ if (debug >= 10) { |
|
+ int usec = random() % 1000000; |
|
+ printf(_("sleeping ~%d usec\n"), usec); |
|
+ usleep(usec); |
|
} |
|
|
|
gettimeofday(&nowsystime, NULL); |
|
- tdiff = time_diff(nowsystime, beginsystime); |
|
- if (tdiff < 0) { |
|
- time_resync = 1; /* probably backward time reset */ |
|
- continue; |
|
- } |
|
- if (tdiff > 0.1) { |
|
- time_resync = 1; /* probably forward time reset */ |
|
- continue; |
|
+ deltavstarget = time_diff(nowsystime, targetsystime); |
|
+ ticksize = time_diff(nowsystime, prevsystime); |
|
+ prevsystime = nowsystime; |
|
+ |
|
+ if (ticksize < 0) { |
|
+ if (debug) |
|
+ printf(_("time jumped backward %.6f seconds " |
|
+ "to %ld.%06d - retargeting\n"), |
|
+ ticksize, (long)nowsystime.tv_sec, |
|
+ (int)nowsystime.tv_usec); |
|
+ /* The retarget is handled at the end of the loop. */ |
|
+ } else if (deltavstarget < 0) { |
|
+ /* deltavstarget < 0 if current time < target time */ |
|
+ if (debug >= 2) |
|
+ printf(_("%ld.%06d < %ld.%06d (%.6f)\n"), |
|
+ (long)nowsystime.tv_sec, |
|
+ (int)nowsystime.tv_usec, |
|
+ (long)targetsystime.tv_sec, |
|
+ (int)targetsystime.tv_usec, |
|
+ deltavstarget); |
|
+ continue; /* not there yet - keep spinning */ |
|
+ } else if (deltavstarget <= target_time_tolerance_secs) { |
|
+ /* Close enough to the target time; done waiting. */ |
|
+ break; |
|
+ } else /* (deltavstarget > target_time_tolerance_secs) */ { |
|
+ /* |
|
+ * We missed our window. Increase the tolerance and |
|
+ * aim for the next opportunity. |
|
+ */ |
|
+ if (debug) |
|
+ printf(_("missed it - %ld.%06d is too far " |
|
+ "past %ld.%06d (%.6f > %.6f)\n"), |
|
+ (long)nowsystime.tv_sec, |
|
+ (int)nowsystime.tv_usec, |
|
+ (long)targetsystime.tv_sec, |
|
+ (int)targetsystime.tv_usec, |
|
+ deltavstarget, |
|
+ target_time_tolerance_secs); |
|
+ target_time_tolerance_secs += tolerance_incr_secs; |
|
+ tolerance_incr_secs *= 2; |
|
} |
|
- beginsystime = nowsystime; |
|
- tdiff = time_diff(nowsystime, refsystime); |
|
- } while (newhwtime == sethwtime + (int)(tdiff + 0.5)); |
|
+ |
|
+ /* |
|
+ * Aim for the same offset (tv_usec) within the second in |
|
+ * either the current second (if that offset hasn't arrived |
|
+ * yet), or the next second. |
|
+ */ |
|
+ if (nowsystime.tv_usec < targetsystime.tv_usec) |
|
+ targetsystime.tv_sec = nowsystime.tv_sec; |
|
+ else |
|
+ targetsystime.tv_sec = nowsystime.tv_sec + 1; |
|
+ } |
|
+ |
|
+ newhwtime = sethwtime |
|
+ + (int)(time_diff(nowsystime, refsystime) |
|
+ - RTC_SET_DELAY_SECS /* don't count this */ |
|
+ + 0.5 /* for rounding */); |
|
+ if (debug) |
|
+ printf(_("%ld.%06d is close enough to %ld.%06d (%.6f < %.6f)\n" |
|
+ "Set RTC to %ld (%ld + %d; refsystime = %ld.%06d)\n"), |
|
+ (long)nowsystime.tv_sec, (int)nowsystime.tv_usec, |
|
+ (long)targetsystime.tv_sec, (int)targetsystime.tv_usec, |
|
+ deltavstarget, target_time_tolerance_secs, |
|
+ (long)newhwtime, (long)sethwtime, |
|
+ (int)(newhwtime - sethwtime), |
|
+ (long)refsystime.tv_sec, (int)refsystime.tv_usec); |
|
|
|
set_hardware_clock(newhwtime, universal, testing); |
|
} |
|
@@ -1636,7 +1734,7 @@ int main(int argc, char **argv) |
|
|
|
switch (c) { |
|
case 'D': |
|
- debug = TRUE; |
|
+ ++debug; |
|
break; |
|
case 'a': |
|
adjust = TRUE; |
|
@@ -1953,10 +2051,4 @@ void __attribute__((__noreturn__)) hwaudit_exit(int status) |
|
* |
|
* hwclock uses this method, and considers the Hardware Clock to have |
|
* infinite precision. |
|
- * |
|
- * TODO: Enhancements needed: |
|
- * |
|
- * - When waiting for whole second boundary in set_hardware_clock_exact, |
|
- * fail if we miss the goal by more than .1 second, as could happen if we |
|
- * get pre-empted (by the kernel dispatcher). |
|
*/ |
|
-- |
|
1.9.3 |
|
|
|
|