Home » U++ Library support » U++ Libraries and TheIDE: i18n, Unicode and Internationalization » 16 bits wchar
|
Re: 16 bits wchar [message #17322 is a reply to message #17300] |
Thu, 07 August 2008 08:41 |
cbpporter
Messages: 1401 Registered: September 2007
|
Ultimate Contributor |
|
|
OK, the merge went well and except some issues that I expected, no new ones appeared. New characters can even be inserted and displayed in Qtf, but the way text metrics are handled in Qtf makes it look and behave slightly differently, which suggests that Qtf uses a more manual text layout scheme when compared to other methods of output from U++. I will look into it.
I'm using this text image:
This is pretty much a reference Windows rendering with font Arial 24. First character is CJK, second is 'i', third CJK, fourth CJK from SIP, then CJK again and Latin 'M'.
And here is a side by side comparison in three different applications:
First is OpenOffice, second is Notepad, and third is U++ with a Label and an EditField.
So let me congratulate OpenOffice for completely forgetting to display my SIP character! Not oven a black box. But if you try to use cursor to navigate, it will act as if there was an invisible characters at that position. Even super beta KOffice for windows which is an unusable piece of software gets it right. And Notepad and Wordpad can handle it, Notepad rendering it all and Wordpad rendering a black box since it takes font specification literally and doesn't seem to do font pooling. Changing font will result in correct display though.
Next is U++. As you can see, the display work fine, except I don't understand why Arial(24) does not look the same as in all other application. It looks smaller , even without font zooming. I need to fix this somehow.
-
Attachment: font2.PNG
(Size: 1.25KB, Downloaded 958 times)
-
Attachment: font.PNG
(Size: 4.33KB, Downloaded 900 times)
|
|
|
|
|
|
|
|
Re: 16 bits wchar [message #17351 is a reply to message #17331] |
Fri, 08 August 2008 13:34 |
cbpporter
Messages: 1401 Registered: September 2007
|
Ultimate Contributor |
|
|
I fixed Qtf to accept 4 byte UTF8. It is strange that it doesn't accept it when passed directly, but if you copy & paste into a control like RichEdit, it has no problems. Probably because only ParseQtf cares about correct codes, and copy/paste is not checked.
Then I continued fixing EditField navigation. And here I found some interesting problems. Using FontInfo, I keep getting wrong widths for SIP (non BMP) characters, even though it uses the right code points.
So I did some testing and rendered a text composed out of a SIP character and a BMP character using six different fonts:
The first sample uses StdFont, whatever that may be and it has align problems. The second is Arial, and the third and fourth are Windows Japanese fonts MS Mincho and MS Gothic. The standard CJK Windows font should obviously be the best choice, yet they have the worst align problems. Number 5 and six are HAN NOM A (Plane 0)and HAN NOM B (Plane 2), free fonts that have all the needed characters.
As you can see most of the samples are not rendered correctly. It is OK to have such problems when mixing Latin fonts with CJK fonts, but the last 4 fonts are all CJK. The problem is that Windows uses it's callback font exclusively for SIP characters. I couldn't even find a Win API function that when enumerating Unicode ranges uses anything larger than a word. And even if a font contains a SIP character, windows font pooling does not manage to find it. It is clear from the screenshot that the first character is drawn from the same font, and is somehow coerced by the font rendering engine to look more like the selected font. But for CJK font, making them look more like Arial or Verdana doesn't make much sense, and I'm sure that users would not appreciate this. It is clear that all the first characters are drawn from HAN HOM B, because this is my system fallback plane 2 font. If I disable it I get this result:
Only the the 5th sample can draw the first character, because it has it's font given explicitly, and it messes up the second one, because it takes it from a different font.
So my conclusion is the following: Windows tries to render with the given font, for example Times New Roman. It find a non BMP character, it changes the font to the system fallback font for that given font and tries to apply Times New Roman hinting and weight to it. And it fails pretty bad in most cases. This is probably why using FontInfo gives wrong widths: because it fails to change font to some fallback, and tries to return font metrics taking into account current font and maybe some other fonts, but not the fallback.
Then I tried to bypass the whole automatic fallback system, and composed manually my text. Here is the result:
It is obviously a lot better, competing in correctness with sample number 6. Yet it is based on sample number 3, using fallback and standard CJK Windows font, but without letting Windows apply some freaky font transformations that does really work.
So a possible solution would include providing a StdPlane2() function and a modified DrawTextOp function. This way the only font that you have to choose is for BMP CJK. The SIP characters are always drawn with the same font, and it is up to you to choose a font for BMP that fits from a stylistic point of view. Even if the styles don't fit, the sizes will fit a lot better, because Windows is relatively good at giving a glyph with the size you requested, and even if it is not perfect, it is going to be a lot better than in screenshot number one, samples 3 and 4.
What do you think?
PS: And under Linux, what do you think about using some font rendering API that is a little smarter than Xft? Maybe something from gnome or pango? What is your attitude regarding new dependencies ?
-
Attachment: test0.PNG
(Size: 2.12KB, Downloaded 974 times)
-
Attachment: test1.PNG
(Size: 1.88KB, Downloaded 1159 times)
-
Attachment: test2.PNG
(Size: 0.62KB, Downloaded 1220 times)
|
|
|
Re: 16 bits wchar [message #17354 is a reply to message #17351] |
Fri, 08 August 2008 15:32 |
|
mirek
Messages: 13980 Registered: November 2005
|
Ultimate Member |
|
|
cbpporter wrote on Fri, 08 August 2008 07:34 |
Then I continued fixing EditField navigation. And here I found some interesting problems. Using FontInfo, I keep getting wrong widths for SIP (non BMP) characters, even though it uses the right code points.
|
No surprise, FontInfo only supports BMP.
Other than that, the rest of your message indicates what a mess all this is
Quote: |
PS: And under Linux, what do you think about using some font rendering API that is a little smarter than Xft? Maybe something from gnome or pango? What is your attitude regarding new dependencies ?
|
Well, I think we will have to solve this issue in Win32 too... and the solution there will be common for both platforms.
Oh well, I think we will have to start with wchar -> int.... That will solve quite a lot problems (I bet QTF will start working etc...). Besides, int based WString can be quite useful outside text handling too
Then we will have to look into font substitution techniques...
Mirek
|
|
|
Re: 16 bits wchar [message #17356 is a reply to message #17354] |
Fri, 08 August 2008 15:47 |
cbpporter
Messages: 1401 Registered: September 2007
|
Ultimate Contributor |
|
|
luzr wrote on Fri, 08 August 2008 16:32 |
No surprise, FontInfo only supports BMP.
Other than that, the rest of your message indicates what a mess all this is
|
Well I'm pretty sure that I fixed it to work outside of BMP, but not to handle plane based fallback fonts.
Quote: |
Oh well, I think we will have to start with wchar -> int.... That will solve quite a lot problems (I bet QTF will start working etc...). Besides, int based WString can be quite useful outside text handling too
|
Sure, that would be good for start. Even better would be to abstract away such details by using some kind of a string iterator class. Most processing is done by *s++ and similar constructs, and these can be emulated by fast and convenient iterators, which all return 32 bit results when used both with String and WString (and DString, and...).
And for me personally, using 32 bits is pretty much out of the question for production code, because I have very strict RAM needs and I may be forced to replace String and WString with wchar[3] (not null terminated) for most of my database. I hope it doesn't come to this because that would be a terrible mess...
|
|
|
Re: 16 bits wchar [message #17357 is a reply to message #17356] |
Fri, 08 August 2008 18:25 |
|
mirek
Messages: 13980 Registered: November 2005
|
Ultimate Member |
|
|
cbpporter wrote on Fri, 08 August 2008 09:47 |
Sure, that would be good for start. Even better would be to abstract away such details by using some kind of a string iterator class. Most processing is done by *s++ and similar constructs, and these can be emulated by fast and convenient iterators, which all return 32 bit results when used both with String and WString (and DString, and...).
|
IMO it just looks like being simple.
Consider only the simple fact that you might want to display the column number in TheIDE
Quote: |
And for me personally, using 32 bits is pretty much out of the question for production code, because I have very strict RAM needs and I may be forced to replace String and WString with wchar[3] (not null terminated) for most of my database. I hope it doesn't come to this because that would be a terrible mess...
|
I think WString in fact should only be used as "transient uncompressed form". Just like it already is everywhere, except EditField.
If you really have very strict memory requirements, using something like ZCompress on UTF-8 String would have superior results anyway...
Hm, OTOH, using only 3 bytes per character in WString perhaps is not that bad idea
Mirek
|
|
|
Re: 16 bits wchar [message #17363 is a reply to message #17357] |
Sat, 09 August 2008 01:45 |
cbpporter
Messages: 1401 Registered: September 2007
|
Ultimate Contributor |
|
|
Here is a little demo of my effort thus far. Nothing too fancy, just a windows with and EditField. Keyboard navigation, editing, selecting work great and it no longer looks like crap even though I'm using two different fonts for rendering. Probably if you use all the keyboard shortcuts it may be possible to mess the cursor position up, since I didn't investigate all shortcuts, and mouse selection is not fixed yet.
The predefined text consists of SIP, BMP, BMP, space, Latin, Space, SIP, space, Latin, BMP, space, SIP. You will need to download HAN NOM A and HAN NOM B, and set up HAN NOM B as the plane 2 fallback font. Maybe in the future we can do a little guess work, and if a font can print a character from a given plane and a registry setting for that plane is missing, we could still use it as a fallback only in U++.
You can find instructions here: <a href="http://winvnkey.sourceforge.net/webhelp/surrogate_fonts.htm" target="_blank">here</a>. Internet Explorer setting are not necessary.
edit: link was missing.
-
Attachment: TestCJK.rar
(Size: 381.38KB, Downloaded 349 times)
[Updated on: Sat, 09 August 2008 09:02] Report message to a moderator
|
|
|
|
|
Goto Forum:
Current Time: Tue May 28 16:04:22 CEST 2024
Total time taken to generate the page: 0.45971 seconds
|