FLTK logo

STR #3143

FLTK matrix user chat room
(using Element browser app)   FLTK gitter user chat room   GitHub FLTK Project   FLTK News RSS Feed  
  FLTK Apps      FLTK Library      Forums      Links     Login 
 Home  |  Articles & FAQs  |  Bugs & Features  |  Documentation  |  Download  |  Screenshots  ]
 

Return to Bugs & Features | Roadmap 1.3 | SVN ⇄ GIT ]

STR #3143

Application:FLTK Library
Status:1 - Closed w/Resolution
Priority:4 - High, e.g. key functionality not working
Scope:2 - Specific to an operating system
Subsystem:WIN32
Summary:After returning from modal operation, the main loop does not handle awakes (Win32).
Version:1.3.3
Created By:matszpk
Assigned To:ianmacarthur
Fix Version:1.3.4 (SVN: v10714)
Update Notification:

Receive EMails Don't Receive EMails

Trouble Report Files:


Name/Time/Date Filename/Size  
 
#1 matszpk
09:03 Oct 24, 2014
threadhangup.cpp
4k
 
 
#2 matszpk
09:41 Oct 24, 2014
threadhangup2.cpp
4k
 
 
#3 ianmacarthur
08:30 Apr 21, 2015
str-3143-threads.cxx
3k
 
 
#4 ianmacarthur
08:46 Apr 21, 2015
str-3143-threads-2.cxx
4k
 
 
#5 ianmacarthur
14:16 Apr 21, 2015
str-3143-threads-3.cxx
4k
 
     

Trouble Report Comments:


Name/Time/Date Text  
 
#1 matszpk
09:53 Oct 23, 2014
This bug was encountered during testing my application on Windows OS. Because program was a multithreading, I used Fl::awake to implement communication between threads.
However, I observed dangerous behavior (on Windows): after modal operation (moving/resizing any window) main thread (when was in the wait loop) did not handle previous awakes immediately.
Main thread (GUI thread) handled previous awakes after an awake that was sent after modal operation (window move/resize).
In some cases, that bug causes a program hangup. for example during stopping thread (when thread tried to send message about end of work by using awake and that awake will not be handled by main thread).

Possible workaround that can be used in new applications is:
at end of the work, thread can send awakes repeatedly until main thread confirm a receiving of previous awakes.
Like this:

threadfunc()
{
   //some operations;
   // end of work
   while (Fl::awake(endOfThread, data)!=0); // send end of thread
   // awakening main thread (this is bug workaround)
   while (!mainThreadIsNotified)
   { Fl::awake(); usleep(100000); }
}

I will be happy, when this bug will be fixed in next FLTK version.

Sorry for my english :(
 
 
#2 AlbrechtS
03:18 Oct 24, 2014
Thanks for the report. But we need some more info to track this down.

You wrote "This bug was encountered during testing my application on Windows OS". Is the same program working different on other OS's? If yes, on which OS's does it work as you expected?

Are you using a threading library (e.g. boost), or pthreads, or Windows native threads? If on different OS's, which one on each OS?

From what I know about the implementation of Fl::awake() on Windows (IIRC), it sends a Windows message to the Windows event queue. We know that events are not handled while resizing or dragging windows, but they should be buffered and handled after the drag/resize is done. What happens if you move the mouse over the main window or click a button in the main window after the problem occurred? If it was a Windows message problem, this should trigger the FLTK event loop and eventually process the awake message.

Threading issues are different to debug. It's not clear from your description if this is really a FLTK problem. There can be other issues in your threading logic. I'm particularly wondering why the entire program would hang if a thread's awake message was not processed. Is there another thread that depends on this one? Wouldn't it be better do synchronize the threads with other means than sending Fl::awake() messages?

To make things clear and easier to debug, can you post a small, compileable example program that shows the issue, together with a description how you can make it happen? This would be very helpful.

Otherwise I'm afraid we might not be able to find the cause of what you described, before we can see that it happens in a simple test case, and how it can be reproduced.
 
 
#3 matszpk
09:12 Oct 24, 2014
Hi. Problem occurs only for Windows version.
I posted simple example (named as threadhangup.cpp) that shows problem. By default program uses C++11 threads. After uncommenting USE_NATIVE_THREADS program will use Windows native threads. In both cases problem exists (regardless used threads library).
Usage: Just click start and wait for 5 seconds. Meanwhile you can drag window and drop it. And observe what is doing later (or not doing).
 
 
#4 matszpk
09:43 Oct 24, 2014
I am sorry for bugs in my example. I upload fixed version.
Bug in freeing resources. New version does not close a NULL handle.
I apologize for any inconvenience.
 
 
#5 AlbrechtS
10:04 Oct 24, 2014
Thanks for the demo program, I'll have a look at it later. Maybe someone else can take a look as well.

You wrote "Problem occurs only for Windows version". Does this mean that both Linux (Unix) AND Mac OS X work, or did you only test one or the other?
 
 
#6 matszpk
10:11 Oct 24, 2014
I was testing my program only under Linux and Windows. I don't know how this example behaves under MacOSX.  
 
#7 matszpk
10:14 Nov 04, 2014
Is any progress in resolving this bug? Is any help needed? Please notify me (by posting on this thread) about progress or any help that will be needed.  
 
#8 AlbrechtS
03:30 Nov 05, 2014
You will be notified if there is progress, but we are a tiny group of developers, and it may take some time before someone will take this STR and try to resolve the issue.

That said, I tried to compile your program under Windows (MinGW) and Linux. Both cases did not compile OOTB, but this was easy to fix:

(1) Windows:
(1a) uncomment #define USE_NATIVE_THREADS 1
(1b) #define nullptr NULL

(2) Linux: I had to compile with -std=c++11 (program uses C++11 threads)

My MinGW compiler didn't compile the demo with -std=c++11.

Anyway (see follow-up) ...
 
 
#9 AlbrechtS
03:34 Nov 05, 2014
I can confirm the hangup of the given demo program under Windows, when clicking the START button and moving the window around until the thread terminates. I even had to kill the program with the Task Manager.

The program continues to work though, if you stop moving the window before the "Finish" message would occur in the background.

To be precise:

(1) click START, wait for Message:0, move window for 2 sec. -> continues
(2) click START, wait for Message:3, move window for 3+ sec. -> hangup

Both test cases work under Linux (program continues).

So far, so good (or bad). This doesn't mean that the fault is on the FLTK side.

My available time doesn't permit to investigate deeper. Maybe someone who is more familiar with thread programming can take a look at the code. Ian? Greg?

Please don't hold your breath though...
 
 
#10 AlbrechtS
06:13 Apr 21, 2015
Any comments from our threading specialists?  
 
#11 ianmacarthur
08:38 Apr 21, 2015
Added a slightly revised, and WIN32 only, version of the test code.

This exhibits the failure, but has some printf's added so we can see that the program is in fact still running.

Also added a heartbeat mechanism that will flush the awake() queue periodically; doing so gets the program out of its "locked up" state and continues normally - note that when it does so, all the "pending" awake messages also flush out.

(dangerous speculation follows - I have not invetsigated this)

So, all the messages are in the awake queue, but do not get flushed until it is "nudged" again, and other events seem to be largely ignored until the awake queue has flushed... so we end up in some sort of deadlock here.

Win32 will not process events whilst a window is moving or resizing, which seems to be how the events get stuck in the awake queue in the first place, but why they do not immediatley flush once the the window stops moving I can not say.

Nor why other events are not serviced either, of course...

More investigation is needed.
 
 
#12 ianmacarthur
08:48 Apr 21, 2015
Actually; I think non-awake events are still prcessed normally.

It just looks like "the system" doesn't know there is anything pending in the awake queue and ignores them, until it gets a little nudge to flush the awake queue and everything springs back into life.

See str-3143-threads-2.cxx as an example...

fluid --compile str-3143-threads-2.cxx

Win32 only.
 
 
#13 ianmacarthur
14:17 Apr 21, 2015
Added file
str-3143-threads-3.cxx

This builds OK for Win32 and for OSX. OSX does not exhibit the failure mode.

fltk-config --compile str-3143-threads-3.cxx
 
 
#14 ianmacarthur
15:32 Apr 21, 2015
Some very basic analysis:

When we do an awake() under Win32, then in Fl_lock.cxx, circa line 244, we PostThreadMessage() from the worker thread to wake up the main thread.

The main thread catches these message in fl_wait(), using PeekMessageW(), see Fl_win32.cxx, circa line 415.

This then checks to see if the message is of type fl_wake_msg, and if so triggers any pending awake message callbacks that may be queued (indeed at this point it will flush any pending awake message callbacks, not just the one associated with the PostThreadMessage() that triggered the checks...)

However, when the window is moving or resizing, it seems that fl_wait() doesn't process the posted thread messages, nor do they appear to be queued anywhere, so it looks like any call to PostThreadMessage() whilst the window in the main thread is "moving" are just getting lost.

The awake callbacks still get queued into the awake callback queue, but fl_wait() does not know this, nor does it execute them.

Any subsequent call to Fl::awake() will get them executed however.

Options?

a) figure out what happens to the "lost" PostThreadMessage's

b) have fl_wait() check if the awake ring buffer is non-empty n=and if it is, flush it

c) something else?
 
 
#15 ianmacarthur
15:44 Apr 21, 2015
OK, this seems to work: Not sure if it's a good solution or not though...

Adding the following immediately before the call to Fl::flush() at about line 435 of Fl_win32.cxx:


  if (Fl::awake_ring_head_ != Fl::awake_ring_tail_)
  {
    Fl_Awake_Handler func;
    void *data;
    while (Fl::get_awake_handler_(func, data)==0) {
      func(data);
    }
  }


Will get things going again.

Basically, once you stop dragging the window about, and fl_wait() gets a chance to run, it notices that the awake cb ring buffer is not empty and flushes any pending callbacks.

This seems to be what we want, though I remain uncertain....
 
 
#16 AlbrechtS
05:36 Apr 22, 2015
Thanks for taking care of this, Ian. I didn't run the test app nor did I try to find out what really happens, i.e. why the messages are not executed in the normal flow.

However, I found out that the queueing of Fl::awake() messages happens in the context of the user's thread before it sends the message to the Windows message queue in src/Fl_lock.cxx, line #125:

int Fl::awake(Fl_Awake_Handler func, void *data) {
  int ret = add_awake_handler_(func, data);
  Fl::awake();
  return ret;
}

Fl::add_awake_handler_() queues the awake handler in the internal queue, and the other overloaded method Fl::awake(void* msg = 0) posts the Windows message (line #243-245, as you mentioned).

One possible explanation of the behavior would be that the latter doesn't happen when the window is moved or resized, but that is pure speculation.

OTOH the patch you suggested looks okay if we can't find a better solution. I'd probably put the common code in a static function we can call in that file for better maintenance later.

Side note: there are some other issues I found when looking at the code. See STR #3223.
http://www.fltk.org/str.php?L3223
 
 
#17 ianmacarthur
14:24 Apr 22, 2015
Time permitting, and if no one objects, I'll commit my "fix" for this sometime in the next few days.  
 
#18 ianmacarthur
03:13 Apr 23, 2015
Fixed in Subversion repository.  
     

Return to Bugs & Features ]

 
 

Comments are owned by the poster. All other content is copyright 1998-2024 by Bill Spitzak and others. This project is hosted by The FLTK Team. Please report site problems to 'erco@seriss.com'.