FLTK logo

STR #3516

FLTK matrix user chat room
(using Element browser app)   FLTK gitter user chat room   GitHub FLTK Project   FLTK News RSS Feed  
  FLTK Apps      FLTK Library      Forums      Links     Login 
 Home  |  Articles & FAQs  |  Bugs & Features  |  Documentation  |  Download  |  Screenshots  ]
 

Return to Bugs & Features | SVN ⇄ GIT ]

STR #3516

Application:FLTK Library
Status:1 - Closed w/Resolution
Priority:2 - Low, e.g. a documentation error or undocumented side-effect
Scope:3 - Applies to all machines and operating systems
Subsystem:Core Library
Summary:Fl_Clock timer inaccuracies
Version:1.4-current
Created By:AlbrechtS
Assigned To:AlbrechtS
Fix Version:1.4.0
Fix Commit:3701950a90e0b57d4a64e9ca26c1616cbc386c39
Update Notification:

Receive EMails Don't Receive EMails

Trouble Report Files:


Name/Time/Date Filename/Size  
 
#1 AlbrechtS
09:02 Mar 12, 2019
Fl_Clock_test.diff
1k
 
 
#2 AlbrechtS
09:13 Mar 12, 2019
clock_with_delta.txt
7k
 
 
#3 manolo
01:18 Mar 14, 2019
Win10inVirtualBox.log
18k
 
 
#4 AlbrechtS
04:34 Mar 15, 2019
patch_add_timeout.diff
2k
 
 
#5 AlbrechtS
04:54 Mar 15, 2019
patch_v2.diff
1k
 
     

Trouble Report Comments:


Name/Time/Date Text  
 
#1 AlbrechtS
09:02 Mar 12, 2019
The Fl_Clock widget has its own timer function 'tick()' which tries to get a timer interrupt *exactly* at the next second boundary. Due to timer inaccuracies the timeout can be triggered a short time before the next second. If this happens a *very* short timeout is scheduled to happen at the next second. This can again trigger too early and so on...

Although the effect is benign (unnecessary timer interrupts stealing CPU cycles) it should be fixed.

I observed this effect with FLTK 1.4.0 under Linux because Linux uses a much more accurate timer. Under Windows the timer seems to be always late by about 2-4 ms, hence the effect is not noticeable. I can't test macOS.

Solution: The simple solution is to schedule the timer a tiny bit later than the exact value of the next second. I found 25 ms to be a very good compromise which eliminates the extra timer events almost completely with only very rare exceptions.

Note: I considered using Fl::repeat_timeout() but this has the same issue and is unfortunately platform dependent (implementation details).

Attached file 'Fl_Clock_test.diff' adds the test code I used to Fl_Clock's tick() callback to show the effect with original and new code if someone is interested to test.

Note: this STR is mainly for documentation. I'll commit the fix ASAP.
 
 
#3 AlbrechtS
09:13 Mar 12, 2019
See logs in attached file 'clock_with_delta.txt'.

Please ignore the "value() = ..." output. This is related to another issue I'm still investigating.
 
 
#4 AlbrechtS
09:38 Mar 12, 2019
Fixed in Git repository.  
 
#5 manolo
05:03 Mar 13, 2019
Here is first a report of what happens under macOS:
earlier-than-scheduled timeouts don't seem to occur
under macOS. Thus, the problem reported in this STR
doesn't seem to apply to macOS.

Also, I'd like to propose an alternative solution to this STR,
although the current solution is perfectly efficient.
The currently implemented solution is to timeout
at each full second + 25 ms and display 'full second' as
the current time when the timer is triggered.

The alternative would be to timeout at the exact full second,
and when the timer is triggered a little earlier than scheduled
(that is the source of the problem), to display
as current time the very-close-in-the-future second
and to set the next timeout to the next full second.

The tick() function would become:

static void tick(void *v) {
  time_t sec;
  int usec;
  Fl::system_driver()->gettime(&sec, &usec);
  
  double delta = (1000000 - usec)/1000000.; // time till next second
  ulong this_sec = sec;
  // if current time is just before full second, show that full second and wait one more second
  if (usec > 900000) { delta += 1; this_sec++; }
  ((Fl_Clock*)v)->value(this_sec);
  Fl::add_timeout(delta, tick, v);
  
  fprintf(stderr, "Fl_Clock - tick(), gettime = (%lu, %6d)", sec, usec);
  fprintf(stderr, ", value() = %10lu, delta = %6.4f\n", ((Fl_Clock*)v)->value(), delta);
  fflush(stderr);
}
 
 
#6 AlbrechtS
16:13 Mar 13, 2019
Manolo, thanks for testing and for your comments and code proposal. I agree that your solution is better since it doesn't try to "fix" anything on systems that are not broken (Windows and macOS, maybe other systems as well).

I changed two minor things in your proposal and removed the test statements:

(1) There's no need to add yet another variable 'this_sec' except for logging, hence I removed it and used 'sec' directly.

(2) I believe the code is easier to understand if the comparison for early triggering is in the same units (seconds) as the following addition, hence I changed this as well (delta < 0.1).

Other than that I changed formatting and (re-)added the reference to this STR for later maintainers. Here's what I got:

static void tick(void *v) {
  time_t sec;
  int usec;
  Fl::system_driver()->gettime(&sec, &usec);
  double delta = (1000000 - usec)/1000000.; // time till next second
  // if current time is just before full second, show that full second
  // and wait one more second (STR 3516)
  if (delta < 0.1) {
    delta += 1.0;
    sec++;
  }
  ((Fl_Clock*)v)->value(sec);
  Fl::add_timeout(delta, tick, v);
}

If you agree I can commit it.
 
 
#7 AlbrechtS
16:59 Mar 13, 2019
Minimal correction: to be type conform the value() assignment should use the same cast (ulong) as in the original code:

  ((Fl_Clock*)v)->value((ulong)sec);

BTW: Tested under Linux and Windows.
 
 
#8 manolo
01:25 Mar 14, 2019
@Albrecht: Great. Ok with all proposed changes.

I could test here only with Windows10 in VirtualBox,
and got the output attached at Win10inVirtualBox.log
Earlier-than-expected timeouts do occur quite frequently,
and when it's more than 0.1 sec in advance, the clock
ticks twice in the second, which is what we wanted to
avoid in the first place.
That could well be an artefact of virtualization.
What happens on a genuine Windows box?
 
 
#9 AlbrechtS
07:41 Mar 14, 2019
@Manolo:

(1) I never saw early timeouts (before the full second) on my (bare metal) Windows 10 box. See 'clock_with_delta.txt' which includes Linux and Windows.

(2) All my Linux tests have been in a Virtualbox VM on the same Windows host (running Ubuntu 18.04). Maybe this caused the issue, given your experience with Windows in a Virtualbox VM. Unfortunately I don't have a bare metal Linux box to test with.

(3) It doesn't matter /why/ it happens.

(4) We know it /does/ happen.

The logical conclusion is (IMHO) that the best we can do is just to round to the next second boundary, i.e. change the comparison mentioned above to:

  if (delta < 0.5) { ... }

This should fix all issues in a consistent way. In the best case it won't happen at all in which case we just "round down" to the current second (as is done in the old code), but in all other cases (up to 500 ms early) we "round up" to the next second. This should guarantee only one tick per second - unless something is really going awry which should never happen but we can't do better anyway.

Okay ?
 
 
#10 manolo
09:29 Mar 14, 2019
1) I can use here FLTK under MacOS build with --enable-x11
which is effectively a genuine Unix+X11 platform.
I see there exactly what you report for Ubuntu in Virtualbox.
Thus early timeouts do occur.

2) What I get with Windows10 in Virtualbox (on MacOS)
is very variable. Most often, no timeouts earlier than 0.1 sec
seem to occur. But the attached log file is a run that
produced lots of early timeouts.

3) I'm a little bit reluctant to use the .5 sec threshold
because the clock would be .5 sec in advance at times,
and that's visible to the eye. I would prefer to keep
the .9 (or .8) sec threshold at the cost a few (mostly rare)
double ticks, but would keep the clock visually correct.
 
 
#11 manolo
09:31 Mar 14, 2019
Corrigendum: in the new parlance, the threshold
would be .1 or .2 seconds.
 
 
#12 AlbrechtS
17:22 Mar 14, 2019
OK, I agree that we should stay with a short period like 0.1 or 0.2 seconds for the "round up" case because we can't know for which reason a timer may expire early. 0.1 or 0.2 seconds early appears to be safe to exclude visible effects on the Fl_Clock widget.

Side note: ISTR that (some) posix timer functions may be triggered earlier if the program receives a signal. I don't know though if this applies here because I don't know (and don't have the time to investigate) how timers are implemented in FLTK. If anybody felt inclined...

Anyway, see my note above: 0.1 or 0.2 or anything in that range seems to be "safe" so I suggest to use the code as given in comments #6 and #7. The only question remains: which exact value to use?
 
 
#13 manolo
00:21 Mar 15, 2019
I suggest to use 0.1 sec so ahead clock displays are not/hardly visible.
All that started in the first place so multiple clocks tick together.
 
 
#14 AlbrechtS
04:34 Mar 15, 2019
I think I found the root cause, i.e. why we get these early timeouts in the first place. It doesn't have to do with Posix timers since FLTK implements its own timer queue. The culprit is in the implementation of Fl::add_timeout() in src/drivers/X11/Fl_X11_Screen_Driver.cxx, i.e. X11 specific.

Background: the current implementation basically handles add_timeout() the same way as repeat_timeout(), i.e. add_timeout() *calls* repeat_timeout(). However, repeat_timeout() intentionally *corrects* the timeout value by the value found in the global variable 'missed_timeout_by' which is set when the timer expires, directly before the timer callback is called. This variable is never reset.

The consequence in Fl_Clock/tick() is: the later the tick() timeout gets triggered, the more is subtracted from the target time value calculated by tick() and hence the earlier the next timeout occurs!

Example: let's say we missed the exact timeout by 20 ms, then tick() calculates 980 ms delta time, but Fl::add_timeout() "corrects" this by -20 ms. Hence the next timeout will occur 20 ms before the next second.

This is not only an issue with Fl_Clock. Since the 'missed_timeout_by' variable is never reset any call to Fl::add_timeout() is due to an arbitrary subtraction of the 'missed_timeout_by' value of the previous timeout, even if Fl::add_timeout() is not called in a timer callback but for instance in handling of an FL_SHOW event or in a callback related to a mouse click).

This is also violating the documentation and very likely different than on other platforms. The correction in Fl::repeat_timeout() also assumes (requires) that Fl::repeat_timeout() must only be called in a timeout callback which is probably implied but not clearly documented.

As a proof I'm attaching a patch (patch_add_timeout.diff) that patches Fl_X11_Screen_Driver::add_timeout() and adds some test statements. The patch also disables the "correction" in tick() and replaces it with a diagnostic message. The patch is pretty long, but most of the patch is test code.

The only really relevant part is:

 void Fl_X11_Screen_Driver::add_timeout(double time, Fl_Timeout_Handler cb, void *argp) {
   elapse_timeouts();
+  missed_timeout_by = 0;
   repeat_timeout(time, cb, argp);
 }

The patch of the tick() function is as proposed in comments #6/#7 but disables the correction and replaces it by the diagnostic message as written above so we can test the behavior w/o the correction in tick().

The patch also adds another test option in tick():

+#if (1) // normal behavior, works well with patched Fl::add_timeout()
+  Fl::add_timeout(delta, tick, v);
+#else // using Fl::repeat_timeout() (almost) simulates "old behavior"
+  Fl::repeat_timeout(delta, tick, v);
+#endif

The comment says it all:

(1) '#if (1)' uses Fl::add_timeout() as before which works well with the patched Fl::add_timeout() without the "correction" in tick().

(2) replace this with '#if (0)' to simulate the previous behavior of Fl::add_timeout(). This is almost the same, but not identical (it shortcuts elapse_timeouts();), but it's good enough for the test. It clearly shows the effects we saw w/o the fix in tick() before we started...

In my tests under Linux I didn't get any early timer expirations with the patch (as was to be expected).

The patch may need review though (Manolo?) and maybe some more testing and updates of the documentation of Fl::add_timeout() and Fl::repeat_timeout() to clarify the behavior.
 
 
#15 AlbrechtS
04:40 Mar 15, 2019
PS: sorry for the very long statement...

Here's a short excerpt from a test with the given patch and "simulating the old behavior":

tick: sec = 1552649831, usec=     786
-- repeat_timeout(), missed_timeout_by = -0.001887
tick: sec = 1552649832, usec=    4304
-- repeat_timeout(), missed_timeout_by = -0.006192
tick: sec = 1552649832, usec=  995542 *** Delta = 0.004458 (  4458 µsec), value =      45432 ***
-- repeat_timeout(), missed_timeout_by = -0.001733
tick: sec = 1552649832, usec=  998684 *** Delta = 0.001316 (  1316 µsec), value = 1552649832 ***
-- repeat_timeout(), missed_timeout_by = -0.000418
tick: sec = 1552649832, usec=  999928 *** Delta = 0.000072 (    72 µsec), value = 1552649832 ***
-- repeat_timeout(), missed_timeout_by = -0.000346
tick: sec = 1552649832, usec=  999974 *** Delta = 0.000026 (    26 µsec), value = 1552649832 ***
-- repeat_timeout(), missed_timeout_by = -0.000274
tick: sec = 1552649832, usec=  999997 *** Delta = 0.000003 (     3 µsec), value = 1552649832 ***
-- repeat_timeout(), missed_timeout_by = -0.000248
tick: sec = 1552649833, usec=      20
-- repeat_timeout(), missed_timeout_by = -0.000245
tick: sec = 1552649834, usec=     113
 
 
#16 AlbrechtS
04:43 Mar 15, 2019
Note: the patch (#6/#7) may still be useful since you (Manolo) found similar issues under Windows in a VM.  
 
#17 AlbrechtS
04:54 Mar 15, 2019
See attached file 'patch_v2.diff' for a potentially final patch based on current master.

I would split this in two commits though...
 
 
#18 AlbrechtS
11:28 Mar 15, 2019
Re-sending commnet #15 because it didn't reach fltk.bugs (testing).

Here's a short excerpt from a test with the given patch and "simulating the old behavior":

tick: sec = 1552649831, usec=     786
-- repeat_timeout(), missed_timeout_by = -0.001887
tick: sec = 1552649832, usec=    4304
-- repeat_timeout(), missed_timeout_by = -0.006192
tick: sec = 1552649832, usec=  995542 *** Delta = 0.004458 (  4458 µsec), value =      45432 ***
-- repeat_timeout(), missed_timeout_by = -0.001733
tick: sec = 1552649832, usec=  998684 *** Delta = 0.001316 (  1316 µsec), value = 1552649832 ***
-- repeat_timeout(), missed_timeout_by = -0.000418
tick: sec = 1552649832, usec=  999928 *** Delta = 0.000072 (    72 µsec), value = 1552649832 ***
-- repeat_timeout(), missed_timeout_by = -0.000346
tick: sec = 1552649832, usec=  999974 *** Delta = 0.000026 (    26 µsec), value = 1552649832 ***
-- repeat_timeout(), missed_timeout_by = -0.000274
tick: sec = 1552649832, usec=  999997 *** Delta = 0.000003 (     3 µsec), value = 1552649832 ***
-- repeat_timeout(), missed_timeout_by = -0.000248
tick: sec = 1552649833, usec=      20
-- repeat_timeout(), missed_timeout_by = -0.000245
tick: sec = 1552649834, usec=     113
 
 
#19 manolo
06:58 Mar 18, 2019
I confirm that I also see that earlier-than-scheduled
timeouts no longer seem to occur with the one-line
modification of Fl_X11_Screen_Driver::add_timeout()
explained in comment #14.

Thus, that change can be enough for this STR.
The changes mentionned at comment #6
could also be committed to take care of the situation seen
with Windows in Virtualbox.
I'm fine with both soltions.
 
 
#20 AlbrechtS
05:02 Mar 21, 2019
Thanks, I'll double check the add_timer patch and commit unless I see other issues with this patch. Will keep you updated.

And yes, I think that we can do both patches.

I'm currently very busy with another urgent project, so please don't hold your breath.
 
 
#21 AlbrechtS
08:43 Dec 26, 2019
commit 35a3e7cc16f8312c5a750041adf67d2e086d059b fixes the issue of Fl::add_timeout() mentioned in comment 14 and in 'patch_add_timeout.diff' (w/o debug code etc.).  
 
#22 AlbrechtS
09:17 Dec 26, 2019
Fixed in Git repository.

The (hopefully) final fix in commit 3701950a90 reverts the patch in bab61a93d6 and introduces the patch further discussed here.

I propose to close this STR (w/Resolution) but I'm leaving it open for a while for further reference and comments.
 
 
#23 AlbrechtS
03:52 Jan 02, 2020
Closed.  
     

Return to Bugs & Features ]

 
 

Comments are owned by the poster. All other content is copyright 1998-2024 by Bill Spitzak and others. This project is hosted by The FLTK Team. Please report site problems to 'erco@seriss.com'.