STR #3516: Fl_Clock timer inaccuracies - Fast Light Toolkit (FLTK)

STR #3516

Application:	FLTK Library
Status:	1 - Closed w/Resolution
Priority:	2 - Low, e.g. a documentation error or undocumented side-effect
Scope:	3 - Applies to all machines and operating systems
Subsystem:	Core Library
Summary:	Fl_Clock timer inaccuracies
Version:	1.4-current
Created By:	AlbrechtS
Assigned To:	AlbrechtS
Fix Version:	1.4.0
Fix Commit:	3701950a90e0b57d4a64e9ca26c1616cbc386c39
Update Notification:	Receive EMails Don't Receive EMails

Trouble Report Files:

#1	AlbrechtS 09:02 Mar 12, 2019

Fl_Clock_test.diff
1k

#2	AlbrechtS 09:13 Mar 12, 2019

clock_with_delta.txt
7k

#3	manolo 01:18 Mar 14, 2019

Win10inVirtualBox.log
18k

#4	AlbrechtS 04:34 Mar 15, 2019

patch_add_timeout.diff
2k

#5	AlbrechtS 04:54 Mar 15, 2019

patch_v2.diff
1k

Trouble Report Comments:

#1	AlbrechtS 09:02 Mar 12, 2019

The Fl_Clock widget has its own timer function 'tick()' which tries to get a timer interrupt *exactly* at the next second boundary. Due to timer inaccuracies the timeout can be triggered a short time before the next second. If this happens a *very* short timeout is scheduled to happen at the next second. This can again trigger too early and so on... Although the effect is benign (unnecessary timer interrupts stealing CPU cycles) it should be fixed. I observed this effect with FLTK 1.4.0 under Linux because Linux uses a much more accurate timer. Under Windows the timer seems to be always late by about 2-4 ms, hence the effect is not noticeable. I can't test macOS. Solution: The simple solution is to schedule the timer a tiny bit later than the exact value of the next second. I found 25 ms to be a very good compromise which eliminates the extra timer events almost completely with only very rare exceptions. Note: I considered using Fl::repeat_timeout() but this has the same issue and is unfortunately platform dependent (implementation details). Attached file 'Fl_Clock_test.diff' adds the test code I used to Fl_Clock's tick() callback to show the effect with original and new code if someone is interested to test. Note: this STR is mainly for documentation. I'll commit the fix ASAP.

#3	AlbrechtS 09:13 Mar 12, 2019

See logs in attached file 'clock_with_delta.txt'. Please ignore the "value() = ..." output. This is related to another issue I'm still investigating.

#4	AlbrechtS 09:38 Mar 12, 2019

Fixed in Git repository.

#5	manolo 05:03 Mar 13, 2019

Here is first a report of what happens under macOS: earlier-than-scheduled timeouts don't seem to occur under macOS. Thus, the problem reported in this STR doesn't seem to apply to macOS. Also, I'd like to propose an alternative solution to this STR, although the current solution is perfectly efficient. The currently implemented solution is to timeout at each full second + 25 ms and display 'full second' as the current time when the timer is triggered. The alternative would be to timeout at the exact full second, and when the timer is triggered a little earlier than scheduled (that is the source of the problem), to display as current time the very-close-in-the-future second and to set the next timeout to the next full second. The tick() function would become: static void tick(void *v) { time_t sec; int usec; Fl::system_driver()->gettime(&sec, &usec); double delta = (1000000 - usec)/1000000.; // time till next second ulong this_sec = sec; // if current time is just before full second, show that full second and wait one more second if (usec > 900000) { delta += 1; this_sec++; } ((Fl_Clock*)v)->value(this_sec); Fl::add_timeout(delta, tick, v); fprintf(stderr, "Fl_Clock - tick(), gettime = (%lu, %6d)", sec, usec); fprintf(stderr, ", value() = %10lu, delta = %6.4f\n", ((Fl_Clock*)v)->value(), delta); fflush(stderr); }

#6	AlbrechtS 16:13 Mar 13, 2019

Manolo, thanks for testing and for your comments and code proposal. I agree that your solution is better since it doesn't try to "fix" anything on systems that are not broken (Windows and macOS, maybe other systems as well). I changed two minor things in your proposal and removed the test statements: (1) There's no need to add yet another variable 'this_sec' except for logging, hence I removed it and used 'sec' directly. (2) I believe the code is easier to understand if the comparison for early triggering is in the same units (seconds) as the following addition, hence I changed this as well (delta < 0.1). Other than that I changed formatting and (re-)added the reference to this STR for later maintainers. Here's what I got: static void tick(void *v) { time_t sec; int usec; Fl::system_driver()->gettime(&sec, &usec); double delta = (1000000 - usec)/1000000.; // time till next second // if current time is just before full second, show that full second // and wait one more second (STR 3516) if (delta < 0.1) { delta += 1.0; sec++; } ((Fl_Clock*)v)->value(sec); Fl::add_timeout(delta, tick, v); } If you agree I can commit it.

#7	AlbrechtS 16:59 Mar 13, 2019

Minimal correction: to be type conform the value() assignment should use the same cast (ulong) as in the original code: ((Fl_Clock*)v)->value((ulong)sec); BTW: Tested under Linux and Windows.

#8	manolo 01:25 Mar 14, 2019

@Albrecht: Great. Ok with all proposed changes. I could test here only with Windows10 in VirtualBox, and got the output attached at Win10inVirtualBox.log Earlier-than-expected timeouts do occur quite frequently, and when it's more than 0.1 sec in advance, the clock ticks twice in the second, which is what we wanted to avoid in the first place. That could well be an artefact of virtualization. What happens on a genuine Windows box?

#9	AlbrechtS 07:41 Mar 14, 2019

@Manolo: (1) I never saw early timeouts (before the full second) on my (bare metal) Windows 10 box. See 'clock_with_delta.txt' which includes Linux and Windows. (2) All my Linux tests have been in a Virtualbox VM on the same Windows host (running Ubuntu 18.04). Maybe this caused the issue, given your experience with Windows in a Virtualbox VM. Unfortunately I don't have a bare metal Linux box to test with. (3) It doesn't matter /why/ it happens. (4) We know it /does/ happen. The logical conclusion is (IMHO) that the best we can do is just to round to the next second boundary, i.e. change the comparison mentioned above to: if (delta < 0.5) { ... } This should fix all issues in a consistent way. In the best case it won't happen at all in which case we just "round down" to the current second (as is done in the old code), but in all other cases (up to 500 ms early) we "round up" to the next second. This should guarantee only one tick per second - unless something is really going awry which should never happen but we can't do better anyway. Okay ?

#10	manolo 09:29 Mar 14, 2019

1) I can use here FLTK under MacOS build with --enable-x11 which is effectively a genuine Unix+X11 platform. I see there exactly what you report for Ubuntu in Virtualbox. Thus early timeouts do occur. 2) What I get with Windows10 in Virtualbox (on MacOS) is very variable. Most often, no timeouts earlier than 0.1 sec seem to occur. But the attached log file is a run that produced lots of early timeouts. 3) I'm a little bit reluctant to use the .5 sec threshold because the clock would be .5 sec in advance at times, and that's visible to the eye. I would prefer to keep the .9 (or .8) sec threshold at the cost a few (mostly rare) double ticks, but would keep the clock visually correct.

#11	manolo 09:31 Mar 14, 2019

Corrigendum: in the new parlance, the threshold would be .1 or .2 seconds.

#12	AlbrechtS 17:22 Mar 14, 2019

OK, I agree that we should stay with a short period like 0.1 or 0.2 seconds for the "round up" case because we can't know for which reason a timer may expire early. 0.1 or 0.2 seconds early appears to be safe to exclude visible effects on the Fl_Clock widget. Side note: ISTR that (some) posix timer functions may be triggered earlier if the program receives a signal. I don't know though if this applies here because I don't know (and don't have the time to investigate) how timers are implemented in FLTK. If anybody felt inclined... Anyway, see my note above: 0.1 or 0.2 or anything in that range seems to be "safe" so I suggest to use the code as given in comments #6 and #7. The only question remains: which exact value to use?

#13	manolo 00:21 Mar 15, 2019

I suggest to use 0.1 sec so ahead clock displays are not/hardly visible. All that started in the first place so multiple clocks tick together.

#14	AlbrechtS 04:34 Mar 15, 2019

I think I found the root cause, i.e. why we get these early timeouts in the first place. It doesn't have to do with Posix timers since FLTK implements its own timer queue. The culprit is in the implementation of Fl::add_timeout() in src/drivers/X11/Fl_X11_Screen_Driver.cxx, i.e. X11 specific. Background: the current implementation basically handles add_timeout() the same way as repeat_timeout(), i.e. add_timeout() *calls* repeat_timeout(). However, repeat_timeout() intentionally *corrects* the timeout value by the value found in the global variable 'missed_timeout_by' which is set when the timer expires, directly before the timer callback is called. This variable is never reset. The consequence in Fl_Clock/tick() is: the later the tick() timeout gets triggered, the more is subtracted from the target time value calculated by tick() and hence the earlier the next timeout occurs! Example: let's say we missed the exact timeout by 20 ms, then tick() calculates 980 ms delta time, but Fl::add_timeout() "corrects" this by -20 ms. Hence the next timeout will occur 20 ms before the next second. This is not only an issue with Fl_Clock. Since the 'missed_timeout_by' variable is never reset any call to Fl::add_timeout() is due to an arbitrary subtraction of the 'missed_timeout_by' value of the previous timeout, even if Fl::add_timeout() is not called in a timer callback but for instance in handling of an FL_SHOW event or in a callback related to a mouse click). This is also violating the documentation and very likely different than on other platforms. The correction in Fl::repeat_timeout() also assumes (requires) that Fl::repeat_timeout() must only be called in a timeout callback which is probably implied but not clearly documented. As a proof I'm attaching a patch (patch_add_timeout.diff) that patches Fl_X11_Screen_Driver::add_timeout() and adds some test statements. The patch also disables the "correction" in tick() and replaces it with a diagnostic message. The patch is pretty long, but most of the patch is test code. The only really relevant part is: void Fl_X11_Screen_Driver::add_timeout(double time, Fl_Timeout_Handler cb, void *argp) { elapse_timeouts(); + missed_timeout_by = 0; repeat_timeout(time, cb, argp); } The patch of the tick() function is as proposed in comments #6/#7 but disables the correction and replaces it by the diagnostic message as written above so we can test the behavior w/o the correction in tick(). The patch also adds another test option in tick(): +#if (1) // normal behavior, works well with patched Fl::add_timeout() + Fl::add_timeout(delta, tick, v); +#else // using Fl::repeat_timeout() (almost) simulates "old behavior" + Fl::repeat_timeout(delta, tick, v); +#endif The comment says it all: (1) '#if (1)' uses Fl::add_timeout() as before which works well with the patched Fl::add_timeout() without the "correction" in tick(). (2) replace this with '#if (0)' to simulate the previous behavior of Fl::add_timeout(). This is almost the same, but not identical (it shortcuts elapse_timeouts();), but it's good enough for the test. It clearly shows the effects we saw w/o the fix in tick() before we started... In my tests under Linux I didn't get any early timer expirations with the patch (as was to be expected). The patch may need review though (Manolo?) and maybe some more testing and updates of the documentation of Fl::add_timeout() and Fl::repeat_timeout() to clarify the behavior.

#15	AlbrechtS 04:40 Mar 15, 2019

PS: sorry for the very long statement... Here's a short excerpt from a test with the given patch and "simulating the old behavior": tick: sec = 1552649831, usec= 786 -- repeat_timeout(), missed_timeout_by = -0.001887 tick: sec = 1552649832, usec= 4304 -- repeat_timeout(), missed_timeout_by = -0.006192 tick: sec = 1552649832, usec= 995542 *** Delta = 0.004458 ( 4458 µsec), value = 45432 *** -- repeat_timeout(), missed_timeout_by = -0.001733 tick: sec = 1552649832, usec= 998684 *** Delta = 0.001316 ( 1316 µsec), value = 1552649832 *** -- repeat_timeout(), missed_timeout_by = -0.000418 tick: sec = 1552649832, usec= 999928 *** Delta = 0.000072 ( 72 µsec), value = 1552649832 *** -- repeat_timeout(), missed_timeout_by = -0.000346 tick: sec = 1552649832, usec= 999974 *** Delta = 0.000026 ( 26 µsec), value = 1552649832 *** -- repeat_timeout(), missed_timeout_by = -0.000274 tick: sec = 1552649832, usec= 999997 *** Delta = 0.000003 ( 3 µsec), value = 1552649832 *** -- repeat_timeout(), missed_timeout_by = -0.000248 tick: sec = 1552649833, usec= 20 -- repeat_timeout(), missed_timeout_by = -0.000245 tick: sec = 1552649834, usec= 113

#16	AlbrechtS 04:43 Mar 15, 2019

Note: the patch (#6/#7) may still be useful since you (Manolo) found similar issues under Windows in a VM.

#17	AlbrechtS 04:54 Mar 15, 2019

See attached file 'patch_v2.diff' for a potentially final patch based on current master. I would split this in two commits though...

#18	AlbrechtS 11:28 Mar 15, 2019

Re-sending commnet #15 because it didn't reach fltk.bugs (testing). Here's a short excerpt from a test with the given patch and "simulating the old behavior": tick: sec = 1552649831, usec= 786 -- repeat_timeout(), missed_timeout_by = -0.001887 tick: sec = 1552649832, usec= 4304 -- repeat_timeout(), missed_timeout_by = -0.006192 tick: sec = 1552649832, usec= 995542 *** Delta = 0.004458 ( 4458 µsec), value = 45432 *** -- repeat_timeout(), missed_timeout_by = -0.001733 tick: sec = 1552649832, usec= 998684 *** Delta = 0.001316 ( 1316 µsec), value = 1552649832 *** -- repeat_timeout(), missed_timeout_by = -0.000418 tick: sec = 1552649832, usec= 999928 *** Delta = 0.000072 ( 72 µsec), value = 1552649832 *** -- repeat_timeout(), missed_timeout_by = -0.000346 tick: sec = 1552649832, usec= 999974 *** Delta = 0.000026 ( 26 µsec), value = 1552649832 *** -- repeat_timeout(), missed_timeout_by = -0.000274 tick: sec = 1552649832, usec= 999997 *** Delta = 0.000003 ( 3 µsec), value = 1552649832 *** -- repeat_timeout(), missed_timeout_by = -0.000248 tick: sec = 1552649833, usec= 20 -- repeat_timeout(), missed_timeout_by = -0.000245 tick: sec = 1552649834, usec= 113

#19	manolo 06:58 Mar 18, 2019

I confirm that I also see that earlier-than-scheduled timeouts no longer seem to occur with the one-line modification of Fl_X11_Screen_Driver::add_timeout() explained in comment #14. Thus, that change can be enough for this STR. The changes mentionned at comment #6 could also be committed to take care of the situation seen with Windows in Virtualbox. I'm fine with both soltions.

#20	AlbrechtS 05:02 Mar 21, 2019

Thanks, I'll double check the add_timer patch and commit unless I see other issues with this patch. Will keep you updated. And yes, I think that we can do both patches. I'm currently very busy with another urgent project, so please don't hold your breath.

#21	AlbrechtS 08:43 Dec 26, 2019

commit 35a3e7cc16f8312c5a750041adf67d2e086d059b fixes the issue of Fl::add_timeout() mentioned in comment 14 and in 'patch_add_timeout.diff' (w/o debug code etc.).

#22	AlbrechtS 09:17 Dec 26, 2019

Fixed in Git repository. The (hopefully) final fix in commit 3701950a90 reverts the patch in bab61a93d6 and introduces the patch further discussed here. I propose to close this STR (w/Resolution) but I'm leaving it open for a while for further reference and comments.

#23	AlbrechtS 03:52 Jan 02, 2020

Closed.