STR #2822

FLTK matrix user chat room
(using Element browser app)   FLTK gitter user chat room   GitHub FLTK Project   FLTK News RSS Feed  
  FLTK Apps      FLTK Library      Forums      Links     Login 
 Home  |  Articles & FAQs  |  Bugs & Features  |  Documentation  |  Download  |  Screenshots  ]

Return to Bugs & Features | Post Text | Post File | Prev | Next ]

STR #2822

Application:FLTK Library
Status:5 - New
Priority:1 - Request for Enhancement, e.g. asking for a feature
Scope:3 - Applies to all machines and operating systems
Subsystem:Unicode support
Summary:Fl_Input UTF-8 handling
Created By:chris
Assigned To:Unassigned
Fix Version:Unassigned
Update Notification:

Receive EMails Don't Receive EMails

Trouble Report Files:

Post File ]

No files

Trouble Report Comments:

Post Text ]
Name/Time/Date Text top right image
#1 chris
10:37 Apr 12, 2012
Using Fl_Input::insert() with real UTF-8 strings, I found the
documentation and/or the functionality of the positional parameters
not clear/sufficient for my purpose.

Citing my email to fltk.general from 2012/04/12:

I have a question in regarding use of the Fl_Input::insert() method
in combination with UTF-8 strings.

If I want to insert a text after the second UTF-8 character, it
would seem natural to use:

---- snip ----
#include <FL/Fl_Input.H>
#include <stdio.h>
int main()
  Fl_Input t(0,0,0,0);
  t.value( "ДБИЯ" );
  t.position( 2 );
  printf( "t.value(): '%s' size=%d\n", t.value(), t.size());
  t.insert( "Ж" );
  printf( "t.value(): '%s' size=%d\n", t.value(), t.size());
  printf( "t.position(): %d\n", t.position());
---- snip ----

But it looks like position(2) sets the position to the byte-offset 2
and not after the second *UTF-8 character*, as the outcome is:

---- snip ----
t.value(): 'ДБИЯ' size=8
t.position(): 2
t.value(): 'ДЖБИЯ' size=10
t.position(): 4
---- snip ----

How is it supposed to be done and is this the desired behaviour?

I don't know if a change in behaviour, so that position means
'UTF-8 character position' and NOT 'byte offset', will not break
the API, so this might be an issue for 1.4 or 3.0.

Also I imagine, that other methods of Fl_Input could suffer
the same issues (size(),...)
#2 chris
04:38 Apr 14, 2012
Regarding documentation: Forget my complaint - I had looked at
the current document on the FLTK website, which is from 9/2011.
Since then the in-source documentation of Fl_Input_ has
improved considerably in this aspect and makes it clear now,
how the positional parameters are to be used.

So what's remaining is an RFE to extend the API to make it
possible to use UTF-8 character positions as arguments to
Fl_Input::replace() instead of byte offsets.

Fl_Text_Buffer::replace() would then be another candidate for
such an improvement.
#3 ianmacarthur
16:32 Mar 11, 2014
Though that change might be ABI breaking, so not for 1.3...?


Do we close this, move it to 1.4, or other?
#4 ianmacarthur
15:59 Sep 04, 2014

Is this STR still "active", or can we consider it for closure as "won't fix", or move it to 1.4 or something...?
#5 chris
22:37 Sep 04, 2014
... Pong!

I would say move it to 1.4.

From my view it is still a MUST for a toolkit that operates with UTF-8 to hide the byte-positions from the usercode and go for character-positions in  all its API.

But if think I am wrong with this view, you may also close it - I have my wrappers in place...
#6 ianmacarthur
03:47 Sep 05, 2014
I'll move it to 1.4 for now.

Our desire to maintain the same API as 1.1 makes changing to glyph rather than byte based positions is tricky, but it would be a better option in a future variant!
#7 AlbrechtS
12:12 Sep 18, 2014
Changed priority to RFE, since this is what it is (now).


 (1) the documentation improvements are satisfied

 (2) the RFE is for an _additional_ API with characters instead of bytes

I don't believe that we would want to break the API, even in a new major version.

Please correct me if I'm wrong.
#8 matt
07:12 Jan 20, 2023
You can convert between number of bytes and number of characters with:

int fl_utf_nb_char(const unsigned char *buf, int len);


int fl_utf8strlen(const char *text, int len)
#9 AlbrechtS
07:51 Jan 20, 2023
@Matt: I don't think that the info you posted helps in any way to solve the issue.

What we'd need is a bunch of new methods that let the user input a character index rather than a byte count (offset).

We have, for instance:

  int Fl_Input_::replace(int b, int e, const char *text, int ilen = 0);

Docs say: "Deletes text from b to e and inserts the new string text. ..."

In this method `b' and `e' are byte offsets in the string buffer. IMHO we can't change this because it would break all programs that use it.

What we'd need is an additional method, like (maybe):

  int Fl_Input_::replace_char(int b, int e, const char *text, int ilen = 0);

where `b' and `e' are *character* indices (or offsets) in the string, such that you could replace text in a specific *column* (or columns) of an Fl_Input widget (provided you use a fixed font for displaying in "columns").

This is only one example of many, and the postfix "_char" could also be "_utf8" or "_uc" (for unicode) or anything else. The problem is that such methods would need UTF-8 character counting from the beginning of the buffer to determine the byte positions.

And last but not least: this would need a **bunch** of new methods that do character counting to find the correct byte offsets, not only in Fl_Input but also in many other widgets. At least if we wanted to be "complete" by any means.

I'm not sure if this is feasible, particularly since UTF-8 handling in FLTK is meanwhile pretty old and well established. OTOH it could simplify user code if we had it, at the price of a lot more methods and character counting in many widgets.
bottom left image   bottom right image

Return to Bugs & Features | Post Text | Post File ]


Comments are owned by the poster. All other content is copyright 1998-2023 by Bill Spitzak and others. This project is hosted by The FLTK Team. Please report site problems to ''.