Home » U++ Library support » U++ Libraries and TheIDE: i18n, Unicode and Internationalization » 16 bits wchar
Re: 16 bits wchar [message #17277 is a reply to message #17268] |
Tue, 05 August 2008 15:12   |
 |
mirek
Messages: 14255 Registered: November 2005
|
Ultimate Member |
|
|
cbpporter wrote on Tue, 05 August 2008 06:03 |
luzr wrote on Tue, 05 August 2008 01:51 |
Anyway, might I ask you to think about / comment codepoint == glyph and distinct(codepoint) < 64K claims?
|
I really can't imagine how that would be possible.
First of all, how do you expect to squish almost 100K characters in 64K? Some kind of dynamic character set loading would be needed, and still a string could not contain every possible character.
|
Yes, meanwhile I have studied it a little bit more, you are right.
Quote: |
And second, in Unicode codepoint != glyph. All the 90k+ codepoints can be combined theoretically to produce and endless number of glyphs. Think of Unicode as a comparably more feature poor Qtf. Codepoints are commands. 99% of commands are "print glyph X", but the rest allow you to manipulate the layout and appearance of glyph. It is not a visual manipulation, like with font, rather manipulation that alters the abstract concept of a glyph, like adding diacritics.
|
Well, I was studying this as well and came to conclusion that combining is of little concern.
First, AFAIK, basic Unicode "compliance" does not requite it.
Second, all important ("real") combining codepoints have characters in Unicode.
IMO, I would regard combining as sort of formating info, similar to '\n' or '\t' - something that we need to be aware about (and, in fact, we already are, sort of, see UnicodeCombine...) but do not need to actively support in editors etc...
BTW, that UnicodeCombine is exactly the sort of support that makes sense.
Quote: |
All common diacritics are handled pretty well, but uncommon ones which are often incorrectly handled.
|
This is because there is no general way how to create combined glyph....
Quote: |
Under Windows, when you use such text in non editable controls in U++, you get correct result, but if you use an EditString for example, you have to press cursor keys multiple times to step through a character which visually is made out of only one glyph, but uses several code points as representation.
|
Does not make sense to me... 
Quote: |
This problem can be relatively easily addressed, by updating a couple of functions and making sure than Windows API always gets full chunks of text.
|
IMO, this would be pretty hard to address in fact. Or result in confusing user interface.
Quote: |
Under Linux, such support is a lot poorer. Since we send to X text one codepoint at a time, no composition can take place.
|
Actually, we do not. Interface accepts strings. But I doubt it manages combining.
Quote: |
And I don't even know if the methods from X that are in use can handle such texts. All my experiments in U++ gave the same result: diacritics are removed and the rest of characters are displayed as whitespace. KDE editors seemed quite happy with such codes, while gedit displayed the characters correctly, but without composing them in the same place., so basically it did not do any better than U++ if we would have font pooling.
|
We will, I promise (Well, I would rather describe it as "font substitution"...).
Quote: |
As always, I come to the same conclusion: nobody really cares for proper internationalization and Unicode
|
The question is how combining really helps... IMO, it is not worth the enormous trouble it brings...
Quote: |
Yes, that would help under windows and is must under Linux. We could even use some "heuristics", i.e. if a font has 2 Arabic characters, there is a high probability that it handles all Arabic characters from that given Unicode range. Maybe we can get away by splinting all codepoints into ranges on a per script basis, and only test some key characters, but I can't be sure without testing.
|
Oh, for the beginning, I was rather thinking about "offline experimental scan" to find out what is really going on 
Maybe we should then match "standard substitution fonts" for all basic fonts.
Also interesting point is what then happens to my heurestic "glyph fixing" for characters 256-512 (U++ synthetises missing glyphs there by combining characters 0-256). So it is sort of alternative approach to font substitution. But I would keep it as it results in better looking texts.
Mirek
|
|
|
 |
|
16 bits wchar
By: riri on Mon, 05 February 2007 17:19
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 05 February 2007 23:07
|
 |
|
Re: 16 bits wchar
By: cbpporter on Tue, 25 September 2007 22:03
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 25 September 2007 23:18
|
 |
|
Re: 16 bits wchar
By: cbpporter on Wed, 26 September 2007 07:43
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 26 September 2007 08:48
|
 |
|
Re: 16 bits wchar
By: sergei on Wed, 26 September 2007 14:55
|
 |
|
Re: 16 bits wchar
By: cbpporter on Wed, 26 September 2007 15:37
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 26 September 2007 22:40
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 01 October 2007 14:28
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 10:11
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 10:42
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 10:26
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 12:10
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 21:40
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Thu, 04 October 2007 17:33
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Fri, 12 October 2007 11:52
|
 |
|
Re: 16 bits wchar
By: mirek on Fri, 12 October 2007 11:59
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Fri, 12 October 2007 17:03
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Sun, 21 October 2007 20:19
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Sun, 21 October 2007 23:57
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 22 October 2007 10:47
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 22 October 2007 19:37
|
 |
|
Re: 16 bits wchar
By: mirek on Sun, 21 October 2007 20:14
|
 |
|
Re: 16 bits wchar
By: sergei on Wed, 26 September 2007 01:56
|
 |
|
Re: 16 bits wchar
By: sergei on Wed, 26 September 2007 16:54
|
 |
|
Re: 16 bits wchar
By: cbpporter on Wed, 26 September 2007 19:11
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 24 October 2007 13:27
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Sat, 27 October 2007 11:11
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Fri, 09 November 2007 10:39
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Sun, 11 November 2007 18:45
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 23 July 2008 22:04
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 04 August 2008 15:07
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 04 August 2008 17:14
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:03
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:14
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:20
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:26
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:51
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 10:42
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 15:12
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 15:19
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Thu, 07 August 2008 16:10
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Thu, 07 August 2008 17:40
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Thu, 07 August 2008 20:01
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Fri, 08 August 2008 15:32
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Fri, 08 August 2008 18:25
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: cbpporter on Fri, 05 September 2008 19:13
|
 |
|
Re: 16 bits wchar
By: mirek on Sun, 07 September 2008 13:24
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 04 August 2008 15:03
|
 |
|
Re: 16 bits wchar
By: mirek on Sat, 27 October 2007 11:01
|
Goto Forum:
Current Time: Mon Apr 28 16:50:10 CEST 2025
Total time taken to generate the page: 0.00670 seconds
|