| [ Return to Bugs & Features | Roadmap 1.3 | SVN ⇄ GIT ]
STR #3197
Application: | FLTK Library |
Status: | 1 - Closed w/Resolution |
Priority: | 3 - Moderate, e.g. unable to compile the software |
Scope: | 2 - Specific to an operating system |
Subsystem: | Unicode support |
Summary: | Odd behavior with fl_input and locale |
Version: | 1.3.3 |
Created By: | AlainBandon |
Assigned To: | AlbrechtS |
Fix Version: | Will Not Fix |
Update Notification: | |
Trouble Report Files:
No files
Trouble Report Comments:
|
#1 | AlainBandon 08:28 Feb 23, 2015 |
| When typing (from a french keyboard) an accented letter like é, è, à and so on, in a fl_input box, the display is correctly displaying, the letter, but when trying to get the string from the code (with a buffer->text() ), it appears to contain é (0xe9 0xc3) instead (probably the ascii representation of utf8 encoding), when all other letters from the string are correct.
At the opposite, when inserting from the code a string containing é (0xa9)(encoded in ascii with the locale), the display is still correct but some strange behaviour occurs with the selector when trying to place the cursor just before é letter : it is just impossible to do it by click (only left arrow keyboard works). Even stranger, any string finishing by "é\n" is displayed in a multiline text_display as if \n were not there.
Long story short, this problem makes my text search input a real disaster when dealing with thoses letters. Any quick fix or workaround possible for at least getting the "real string" with ascii-only chars ? | |
|
#2 | AlainBandon 08:41 Feb 23, 2015 |
| erratum : myInput->value() instead of ->buffer()->text()
I found the magic function deciphering the buffer for the display : const char* Fl_Input_::expand(const char* p, char* buf) const
Is there a way to use this function or get the result of it ? | |
|
#3 | AlbrechtS 09:54 Feb 23, 2015 |
| FLTK 1.3 uses exclusively UTF-8 text encoding. There are options though that make FLTK accept _some_ text in ISO-8859-1 encoding and (maybe) display it accordingly instead of displaying an error. This is probably what you are seeing if you enter text in your locale into any widget by using value(char *). All user input, however, is definitely encoded in UTF-8. See this link:
http://www.fltk.org/doc-1.3/migration_1_3.html
Citation: "It is important that, although your software uses only ASCII characters for input to FLTK widgets, the user may enter non-ASCII characters, and FLTK will return these characters with UTF-8 encoding to your application, e.g. via Fl_Input::value(). You will need to re-encode them to your (non-UTF-8) encoding, otherwise you might see or print garbage in your data."
And this means really ASCII (range 32-126 for printable characters).
If you need more info about Unicode and UTF-8 please consult this link: http://www.fltk.org/doc-1.3/unicode.html
That said, what you are seeing might seem to work, but _ALL_ user input will be encoded in UTF-8, so if you save any user input data to a file, take care of this. If you use a FLTK input widget (your search input) to search text that is encoded in another encoding, this is not going to work.
You may be able to convert your input text to your locale encoding, but this is OT here (this behavior is not a bug). If you have further questions how to solve your problem, please ask in fltk.general. https://groups.google.com/forum/#!forum/fltkgeneral | |
|
#4 | AlainBandon 10:48 Feb 23, 2015 |
| I understand the logic. I am supposed to read the value as utf8, and either reencode it as utf-16 in wchar or reconvert it again as chars using the local.
But this also means in this case that there is a problem with the copy paste function inside the input : I can copy a 'é' encoded in ascii (0xa9) from the data or any external way, and paste it in the input. In this case instead of converting the 'é' into utf-8, the input keep it as is. Correct me if I'm wrong but this 'é' should be encoded to utf-8 if I follow your logic. | |
|
#5 | AlainBandon 11:07 Feb 23, 2015 |
| It's really strange actually... The exact repro of the bug is the following : I have a diplay_text filled with ascii locale chars (coming from user data). As I explained before, the display is bugged when dealing with 'é' char (\n supressed and cursor bugged), and I understand I'm faulty to not encode the string I want to display in utf-8.
But here comes the tricky part : I copy the word containing my 'é' and I paste it into my search input, and the "bad" ascii 'é' is conserved. But if I paste it in any other soft like notepad of firefox's search bar and then copy it again from there and finally paste it into my fltk search box, the bad ascii 'é' is correctly converted into utf-8.
So maybe is the copy to clipboard function simply not correctly used in fltk and assume that everything copied from fltk is utf8 encoded (what should be the case but is not in my case). I know that nearly all apps using the clipboard for text (in windows) generally all encode the text as wide chars. | |
|
#6 | AlbrechtS 11:19 Feb 23, 2015 |
| If I was not clear: your input is wrong, thus any behavior dependent on this wrong input is undefined. Unfortunately FLTK's default behavior is defined to tolerate the wrong input and make it _look_ right, but it isn't. This is a compromise to be as kind as possible and not to modify your data.
That said, copy&paste and drag&drop work correctly if you use a correct source and destination. The data is converted accordingly during the d&d or c&p operation. I suggest to test this with a browser and FLTK's text editor (test/editor) or another working editor. I recommend notepad++ where you can set/show/convert the file encoding and do any drag&drop operations with FLTK widgets. The input and output will be converted.
Example: open a file encoded in your locale (I assume Windows CP1252 or ISO-8859-1 or similar) with notepad++. Use the editor's menu "encoding" to display the encoding and/or convert it. Even if the editor displays "Ansi" (aka CP1252) you can drag'n'drop text from that source into FLTK's editor or your search input. It _will_ be converted to UTF-8 on the fly.
Even if you open the file with FLTK's test/editor it will be converted to UTF-8 and you will see a warning popup.
HTH | |
|
#7 | AlbrechtS 11:29 Feb 23, 2015 |
| Okay, my last reply (#6) was after reading your post #4.
Regarding #5: My #6 may explain some parts of your observations. Wrong input leads to undefined (faulty) behavior. FLTK tries to display your data correctly, but OTOH it assumes that the data is in UTF-8 encoding when you copy it. This may sound weird, but that's the case.
Clipboard encoding is generally flexible, but I don't know the exact details. You can rely on the correct reading and writing however, if the application has the correct data. It will be converted on the fly. Try it yourself, but please don't do it with your wrong data in FLTK. This doesn't work.
The same is true for drag&drop. | |
|
#8 | AlbrechtS 11:35 Feb 23, 2015 |
| Well, here is another experiment you can try (but I didn't test it myself).
Use notepad++ as I suggested before. Open a text file in your locale encoding. notepad++ will show the correct encoding ("Ansi").
In the "encoding" menu you have two choices:
(1) check another encoding, e.g. UTF-8. (2) use "convert to UTF-8".
Try both. If you use (1) and do copy and paste, you will probably see similar results as in FLTK with your text, because notepad++ is in error about the encoding.
If you use (2) everything should work flawlessly. | |
|
#9 | AlainBandon 12:03 Feb 23, 2015 |
| I made all your test and all works correctly : generally speaking It is impossible to reproduce my bug using any external text whatever the encoding I use.
So I made the opposite to understand what exactly in the clipboard if fucked when I copy a bad string (fucked up) from the display_text and paste it into the input.
Here is what I tried : - copy the fucked text from the display_text and paste it to the input -> kept fucked - copy the fucked text from the display_text and paste it into HxD as text -> é as 0xE9 (ANSI) - copy the copied fucked text from hxD to the input -> no more fucked
And now I'm completely lost... | |
|
#10 | AlainBandon 12:43 Feb 23, 2015 |
| I installed a software called insideclipboard that allows you to check all parts of the clibboard.
And made the test of copying the string "traité" from the ascii encoded display_text or the input into the clipboard, and then pasting it anywhere else and copying it to the clipboard again, and there is absolutely no difference in the clipboard content.
So I have absolutely no idea of why it gets unfucked when pasted again... this is just magic X_X | |
|
#11 | AlbrechtS 12:56 Feb 23, 2015 |
| General support is not available via the STR form. Please post to the FLTK forums and/or mailing lists for general support.
Short form: garbage in, garbage out.
Long form: sometimes software is "guessing". Essentially that's what FLTK does as well when it is displaying your "Ansi" encoded text as you expect, which is _technically_ wrong (because your text is not UTF-8 encoded).
I'm sorry, I'd like to help you more, but this is really not the place to do so. Please post further questions, as I said before, in our user forum (fltk.general).
https://groups.google.com/forum/#!forum/fltkgeneral | |
[ Return to Bugs & Features ]
|
| |