Home » U++ Library support » U++ Libraries and TheIDE: i18n, Unicode and Internationalization » 16 bits wchar
Re: 16 bits wchar [message #17268 is a reply to message #17266] |
Tue, 05 August 2008 12:03 |
cbpporter
Messages: 1401 Registered: September 2007
|
Ultimate Contributor |
|
|
luzr wrote on Tue, 05 August 2008 01:51 |
Anyway, might I ask you to think about / comment codepoint == glyph and distinct(codepoint) < 64K claims?
|
I really can't imagine how that would be possible.
First of all, how do you expect to squish almost 100K characters in 64K? Some kind of dynamic character set loading would be needed, and still a string could not contain every possible character.
And second, in Unicode codepoint != glyph. All the 90k+ codepoints can be combined theoretically to produce and endless number of glyphs. Think of Unicode as a comparably more feature poor Qtf. Codepoints are commands. 99% of commands are "print glyph X", but the rest allow you to manipulate the layout and appearance of glyph. It is not a visual manipulation, like with font, rather manipulation that alters the abstract concept of a glyph, like adding diacritics.
The reason why this is not that obvious is that Win API handles this for you automatically. Most users and even developers are not familiar with this process, and if somehow their input data contains such characters, Win controls will display them correctly. All common diacritics are handled pretty well, but uncommon ones which are often incorrectly handled. This could be one of U++ strong points in the future. When all font issues are resolved (probably not before 2009.1 ), if we would offer full combining characters support algorithmically where fonts fail, we would certainly be in a relatively unique position.
but since we don't use native controls, we are more exposed to them. Under Windows, when you use such text in non editable controls in U++, you get correct result, but if you use an EditString for example, you have to press cursor keys multiple times to step through a character which visually is made out of only one glyph, but uses several code points as representation.
This problem can be relatively easily addressed, by updating a couple of functions and making sure than Windows API always gets full chunks of text.
Under Linux, such support is a lot poorer. Since we send to X text one codepoint at a time, no composition can take place. And I don't even know if the methods from X that are in use can handle such texts. All my experiments in U++ gave the same result: diacritics are removed and the rest of characters are displayed as whitespace. KDE editors seemed quite happy with such codes, while gedit displayed the characters correctly, but without composing them in the same place., so basically it did not do any better than U++ if we would have font pooling.
As always, I come to the same conclusion: nobody really cares for proper internationalization and Unicode (except Qt or KDE, who seems to have best support out of all, comparable and maybe better than Windows, but seemingly poorer because of available fonts).
Quote: |
Hm, I was thinking about our problem a lot....
I believe that we should do one important thing first - scan all available fonts and count/list all codepoints there...
|
Yes, that would help under windows and is must under Linux. We could even use some "heuristics", i.e. if a font has 2 Arabic characters, there is a high probability that it handles all Arabic characters from that given Unicode range. Maybe we can get away by splinting all codepoints into ranges on a per script basis, and only test some key characters, but I can't be sure without testing.
|
|
|
|
|
16 bits wchar
By: riri on Mon, 05 February 2007 17:19
|
|
|
Re: 16 bits wchar
By: mirek on Mon, 05 February 2007 23:07
|
|
|
Re: 16 bits wchar
By: cbpporter on Tue, 25 September 2007 22:03
|
|
|
Re: 16 bits wchar
By: mirek on Tue, 25 September 2007 23:18
|
|
|
Re: 16 bits wchar
By: cbpporter on Wed, 26 September 2007 07:43
|
|
|
Re: 16 bits wchar
By: mirek on Wed, 26 September 2007 08:48
|
|
|
Re: 16 bits wchar
By: sergei on Wed, 26 September 2007 14:55
|
|
|
Re: 16 bits wchar
By: cbpporter on Wed, 26 September 2007 15:37
|
|
|
Re: 16 bits wchar
By: mirek on Wed, 26 September 2007 22:40
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Mon, 01 October 2007 14:28
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 10:11
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 10:42
|
|
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 10:26
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 12:10
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 21:40
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Thu, 04 October 2007 17:33
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Fri, 12 October 2007 11:52
|
|
|
Re: 16 bits wchar
By: mirek on Fri, 12 October 2007 11:59
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Fri, 12 October 2007 17:03
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Sun, 21 October 2007 20:19
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Sun, 21 October 2007 23:57
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Mon, 22 October 2007 10:47
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Mon, 22 October 2007 19:37
|
|
|
Re: 16 bits wchar
By: mirek on Sun, 21 October 2007 20:14
|
|
|
Re: 16 bits wchar
By: sergei on Wed, 26 September 2007 01:56
|
|
|
Re: 16 bits wchar
By: sergei on Wed, 26 September 2007 16:54
|
|
|
Re: 16 bits wchar
By: cbpporter on Wed, 26 September 2007 19:11
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Wed, 24 October 2007 13:27
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Sat, 27 October 2007 11:11
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Fri, 09 November 2007 10:39
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Sun, 11 November 2007 18:45
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Wed, 23 July 2008 22:04
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Mon, 04 August 2008 15:07
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Mon, 04 August 2008 17:14
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:03
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:14
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:20
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:26
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:51
|
|
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 10:42
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 15:12
|
|
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 15:19
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Thu, 07 August 2008 16:10
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Thu, 07 August 2008 17:40
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Thu, 07 August 2008 20:01
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Fri, 08 August 2008 15:32
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: mirek on Fri, 08 August 2008 18:25
|
|
|
Re: 16 bits wchar
|
|
|
Re: 16 bits wchar
By: cbpporter on Fri, 05 September 2008 19:13
|
|
|
Re: 16 bits wchar
By: mirek on Sun, 07 September 2008 13:24
|
|
|
Re: 16 bits wchar
By: mirek on Mon, 04 August 2008 15:03
|
|
|
Re: 16 bits wchar
By: mirek on Sat, 27 October 2007 11:01
|
Goto Forum:
Current Time: Thu May 30 00:01:13 CEST 2024
Total time taken to generate the page: 0.00990 seconds
|