|
#1 | AlbrechtS 11:39 Mar 15, 2019 |
| Please ignore. Testing ... | |
|
#2 | AlbrechtS 11:40 Mar 15, 2019 |
| Testing some non-ASCII text:
Delta = 0.004458 ( 4458 µsec) | |
|
#3 | AlbrechtS 13:25 Mar 15, 2019 |
| test. | |
|
#4 | AlbrechtS 13:26 Mar 15, 2019 |
| Comment #4 | |
|
#5 | AlbrechtS 13:27 Mar 15, 2019 |
| Testing some non-ASCII text:
Delta = 0.004458 ( 4458 µsec)
Euro: € | |
|
#6 | AlbrechtS 13:33 Mar 15, 2019 |
| äöüß | |
|
#7 | greg.ercolano 13:56 Mar 15, 2019 |
| 4458 µsec | |
|
#8 | greg.ercolano 13:58 Mar 15, 2019 |
| Added 'surrogateescape' to mail-out, 4458 µsec | |
|
#9 | greg.ercolano 13:59 Mar 15, 2019 |
| Added self to mailings for this str (not automatic), 'surrogateescape' solved the error, want to verify the character comes thru email. So: 4458 µsec | |
|
#10 | AlbrechtS 14:00 Mar 15, 2019 |
| --- Ä --- | |
|
#11 | greg.ercolano 14:04 Mar 15, 2019 |
| Well, in email the text for the micro symbol is slightly mangled.
Just before the micro symbol, I also see a capital "A" with a circumflex accent.. so it's really showing up as two characters.
Here's a paste from the email, which includes the above described issue, just to see what happens when we run it through the machine again: --- Added self to mailings for this str (not automatic), 'surrogateescape' solved the error, want to verify the character comes thru email. So: 4458 µsec --- | |
|
#12 | AlbrechtS 14:07 Mar 15, 2019 |
| More Unicode characters:
$ cat fltk-1.4/misc/cp1252_utf-8.txt +---------------+-------------------------------------------------+ | octal --> | 0 1 2 3 4 5 6 7 10 11 12 13 14 16 16 17 | | | dec. -> | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | | v | v |hex| 0 1 2 3 4 5 6 7 8 9 A B C D E F | +-----+-----+---+-------------------------------------------------+ | 0 | 0 | 0 | | | 20 | 16 | 1 | | | 40 | 32 | 2 | ! "" # $ % & ' ( ) * + , - . / | | 60 | 48 | 3 | 0 1 2 3 4 5 6 7 8 9 : ; < = > ? | | 100 | 64 | 4 | @ A B C D E F G H I J K L M N O | | 120 | 80 | 5 | P Q R S T U V W X Y Z [ \ ] ^ _ | | 140 | 96 | 6 | ` a b c d e f g h i j k l m n o | | 160 | 112 | 7 | p q r s t u v w x y z { | } ~ | | 200 | 128 | 8 | € ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž | | 220 | 144 | 9 | ‘ ’ “ ” • – — ˜ ™ š › œ ž Ÿ | | 240 | 160 | A | ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯ | | 260 | 176 | B | ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ | | 300 | 192 | C | À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï | | 320 | 208 | D | Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß | | 340 | 224 | E | à á â ã ä å æ ç è é ê ë ì í î ï | | 360 | 240 | F | ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ | +-----+-----+---+-------------------------------------------------+
File: cp1252_utf-8.txt Encoding: UTF-8
Euro sign: € € € € € | |
|
#13 | greg.ercolano 14:09 Mar 15, 2019 |
| Well now it inserted two more characters, this time a cap A with a tilde over it, followed by a comma.
So this is all interesting, but I have no solution to offer. At least the email is going through now.. I don't know how to translate this character.
Or I should say: I'm guessing it's maybe not Unicode, but ISO-whatever that Microsoft uses sometimes, but I don't know how to detect that and switch it in when needed.
My guess is we should simply reject comments that aren't properly UTF8 encoded properly in the PHP, or try to detect and convert non-utf-8 strings on the fly somehow. strings | |
|
#14 | AlbrechtS 14:11 Mar 15, 2019 |
| $ cat fltk-1.4/misc/cp1252_utf-8.txt +---------------+-------------------------------------------------+ | octal --> | 0 1 2 3 4 5 6 7 10 11 12 13 14 16 16 17 | | | dec. -> | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | | v | v |hex| 0 1 2 3 4 5 6 7 8 9 A B C D E F | +-----+-----+---+-------------------------------------------------+ | 0 | 0 | 0 | | | 20 | 16 | 1 | | | 40 | 32 | 2 | ! " # $ % & ' ( ) * + , - . / | | 60 | 48 | 3 | 0 1 2 3 4 5 6 7 8 9 : ; < = > ? | | 100 | 64 | 4 | @ A B C D E F G H I J K L M N O | | 120 | 80 | 5 | P Q R S T U V W X Y Z [ \ ] ^ _ | | 140 | 96 | 6 | ` a b c d e f g h i j k l m n o | | 160 | 112 | 7 | p q r s t u v w x y z { | } ~ | | 200 | 128 | 8 | € ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž | | 220 | 144 | 9 | ‘ ’ “ ” • – — ™ š › œ ž Ÿ | | 240 | 160 | A | ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯ | | 260 | 176 | B | ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ | | 300 | 192 | C | À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï | | 320 | 208 | D | Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß | | 340 | 224 | E | à á â ã ä å æ ç è é ê ë ì í î ï | | 360 | 240 | F | ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ | +-----+-----+---+-------------------------------------------------+
File: cp1252_utf-8.txt Encoding: UTF-8
Euro sign: € € € € € - End of text - | |
|
#15 | AlbrechtS 14:16 Mar 15, 2019 |
| My last comment was properly formatted UTF-8 encoded text with the MS cp1252 characters, encoded as UTF-8.
I don't see additional characters around the micro (µ) sign but some of the characters in the character table in the range 0x80 - 0x9f are not correctly encode. However, these characters are really some higher Unicode code points. I'll try with pure ISO-Latin1 (ISO-8859-1) encoded in UTF-8 in my next post. | |
|
#16 | AlbrechtS 14:18 Mar 15, 2019 |
| Pure UTF-8 text following. All characters are Unicode Code Points in the range U+0000 - U+00FF:
$ cat fltk-1.4/misc/iso-8859-1_utf-8.txt +---------------+-------------------------------------------------+ | octal --> | 0 1 2 3 4 5 6 7 10 11 12 13 14 16 16 17 | | | dec. -> | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | | v | v |hex| 0 1 2 3 4 5 6 7 8 9 A B C D E F | +-----+-----+---+-------------------------------------------------+ | 0 | 0 | 0 | | | 20 | 16 | 1 | | | 40 | 32 | 2 | ! "" # $ % & ' ( ) * + , - . / | | 60 | 48 | 3 | 0 1 2 3 4 5 6 7 8 9 : ; < = > ? | | 100 | 64 | 4 | @ A B C D E F G H I J K L M N O | | 120 | 80 | 5 | P Q R S T U V W X Y Z [ \ ] ^ _ | | 140 | 96 | 6 | ` a b c d e f g h i j k l m n o | | 160 | 112 | 7 | p q r s t u v w x y z { | } ~ | | 200 | 128 | 8 | | | 220 | 144 | 9 | | | 240 | 160 | A | ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯ | | 260 | 176 | B | ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ | | 300 | 192 | C | À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï | | 320 | 208 | D | Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß | | 340 | 224 | E | à á â ã ä å æ ç è é ê ë ì í î ï | | 360 | 240 | F | ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ | +-----+-----+---+-------------------------------------------------+
File: iso-8859-1_utf-8.txt Encoding: UTF-8 | |
|
#17 | greg.ercolano 14:39 Mar 15, 2019 |
| Testing after adding these mail headers to globals.php:
Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US
So testing the special character again: 4458 µsec | |
|
#18 | AlbrechtS 14:44 Mar 15, 2019 |
| Pure UTF-8 text following. All characters are Unicode Code Points in the range U+0000 - U+00FF:
$ cat fltk-1.4/misc/iso-8859-1_utf-8.txt +---------------+-------------------------------------------------+ | octal --> | 0 1 2 3 4 5 6 7 10 11 12 13 14 16 16 17 | | | dec. -> | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | | v | v |hex| 0 1 2 3 4 5 6 7 8 9 A B C D E F | +-----+-----+---+-------------------------------------------------+ | 0 | 0 | 0 | | | 20 | 16 | 1 | | | 40 | 32 | 2 | ! "" # $ % & ' ( ) * + , - . / | | 60 | 48 | 3 | 0 1 2 3 4 5 6 7 8 9 : ; < = > ? | | 100 | 64 | 4 | @ A B C D E F G H I J K L M N O | | 120 | 80 | 5 | P Q R S T U V W X Y Z [ \ ] ^ _ | | 140 | 96 | 6 | ` a b c d e f g h i j k l m n o | | 160 | 112 | 7 | p q r s t u v w x y z { | } ~ | | 200 | 128 | 8 | | | 220 | 144 | 9 | | | 240 | 160 | A | ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯ | | 260 | 176 | B | ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ | | 300 | 192 | C | À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï | | 320 | 208 | D | Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß | | 340 | 224 | E | à á â ã ä å æ ç è é ê ë ì í î ï | | 360 | 240 | F | ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ | +-----+-----+---+-------------------------------------------------+
File: iso-8859-1_utf-8.txt Encoding: UTF-8 | |
|
#19 | AlbrechtS 14:48 Mar 15, 2019 |
| Posting full Windows CP-1252 character set in UTF-8 encoding:
$ cat fltk-1.4/misc/cp1252_utf-8.txt | cut -b 17-999 +-------------------------------------------------+ | 0 1 2 3 4 5 6 7 10 11 12 13 14 16 16 17 | | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | | 0 1 2 3 4 5 6 7 8 9 A B C D E F | +-------------------------------------------------+ | | | | | ! "" # $ % & ' ( ) * + , - . / | | 0 1 2 3 4 5 6 7 8 9 : ; < = > ? | | @ A B C D E F G H I J K L M N O | | P Q R S T U V W X Y Z [ \ ] ^ _ | | ` a b c d e f g h i j k l m n o | | p q r s t u v w x y z { | } ~ | | € ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž | | ‘ ’ “ ” • – — ˜ ™ š › œ ž Ÿ | | ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯ | | ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ | | À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï | | Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß | | à á â ã ä å æ ç è é ê ë ì í î ï | | ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ | +-------------------------------------------------+ | |
|
#20 | AlbrechtS 15:00 Mar 15, 2019 |
| Korean: 국민경제의 발전을 Greek: Λορεμ ιπσθμ δολορ Japanese: 旅ロ京青利セムレ弱改 Russian: Лорем ипсум долор сит амет Chinese: 側経意責家方家閉討店 Indian: पढाए हिंदी रहारुप अनुवाद
Selected Unicode Characters
U+00E9F: "ຟ" Lao Letter Fo Sung U+0231A: "⌚" Watch U+103AE: "𐎮" Old Persian Sign Di U+1D11E: "𝄞" Musical Symbol G Clef U+1F39C: "🎜" Beamed Ascending Musical Notes U+1F4A9: "💩" Pile of Poo U+1F720: "🜠" Alchemical Symbol For Copper Ore U+1F913: "🤓" Nerd Face | |
|
#21 | AlbrechtS 04:19 Mar 16, 2019 |
| Another UTF-8 mail test...
Windows Codepage 1252 ("Ansi") Character Table
+-----+-----+---+-------------------------------------------------+ | 200 | 128 | 8 | € ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž | | 220 | 144 | 9 | ‘ ’ “ ” • – — ~ ™ š › œ ž Ÿ | | 240 | 160 | A | ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ - ® ¯ | | 260 | 176 | B | ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ | | 300 | 192 | C | À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï | | 320 | 208 | D | Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß | | 340 | 224 | E | à á â ã ä å æ ç è é ê ë ì í î ï | | 360 | 240 | F | ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ | +-----+-----+---+-------------------------------------------------+
Selected Unicode Characters
U+00E9F: "ຟ" Lao Letter Fo Sung U+0231A: "⌚" Watch U+103AE: "𐎮" Old Persian Sign Di U+1D11E: "𝄞" Musical Symbol G Clef U+1F39C: "🎜" Beamed Ascending Musical Notes U+1F4A9: "💩" Pile of Poo U+1F720: "🜠" Alchemical Symbol For Copper Ore U+1F913: "🤓" Nerd Face | |
|
#22 | AlbrechtS 04:25 Mar 16, 2019 |
| | À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï | | Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß | | à á â ã ä å æ ç è é ê ë ì í î ï | | ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ | | |