Home » U++ Library support » U++ Libraries and TheIDE: i18n, Unicode and Internationalization » Making support for code pages unlimited
|
|
|
|
|
|
Re: Making support for code pages unlimited [message #25089 is a reply to message #23056] |
Tue, 09 February 2010 23:45 |
Mindtraveller
Messages: 917 Registered: August 2007 Location: Russia, Moscow rgn.
|
Experienced Contributor |
|
|
Recently I had some time to analyze encodings. And I've met a number of issues to be discussed.
First, I discovered that CHARSET_**** tables are not in unicode, but in UTF-8. If it is so, I should rebuild proposed tables according to this.
Second issue is simple etude, which actually failed. Just look at this simple code:CONSOLE_APP_MAIN
{
Cout() << ToCharset(CHARSET_CP866, "Всем привет!", CHARSET_UTF8);
}
I executed this example as console app under Windows XP. My "native" console code page under XP is CP866. But on program run, instead of cyrillic letters, I've seen garbage symbols. My question: what is wrong in this example and why do you think it fails to convert symbols properly?
P.S. I tried to update CHARSET_CP866 array to contain UTF-8 encoded symbols, but this little example still fails to convert symbols propely.
[Updated on: Tue, 09 February 2010 23:49] Report message to a moderator
|
|
|
Re: Making support for code pages unlimited [message #25182 is a reply to message #25089] |
Sat, 13 February 2010 13:58 |
|
mirek
Messages: 13975 Registered: November 2005
|
Ultimate Member |
|
|
Mindtraveller wrote on Tue, 09 February 2010 17:45 | Recently I had some time to analyze encodings. And I've met a number of issues to be discussed.
First, I discovered that CHARSET_**** tables are not in unicode, but in UTF-8. If it is so, I should rebuild proposed tables according to this.
|
What makes you think that? Those tables are just UTF-16 codes for characters 128-255.
Quote: |
Second issue is simple etude, which actually failed. Just look at this simple code:CONSOLE_APP_MAIN
{
Cout() << ToCharset(CHARSET_CP866, "Всем привет!", CHARSET_UTF8);
}
I executed this example as console app under Windows XP. My "native" console code page under XP is CP866. But on program run, instead of cyrillic letters, I've seen garbage symbols. My question: what is wrong in this example and why do you think it fails to convert symbols properly?
|
Well, it looks like things are more complicated. I believe our part is OK, but the console simply expcts the output to be in different charset. E.g.:
http://illegalargumentexception.blogspot.com/2009/04/i18n-un icode-at-windows-command-prompt.html
However, more thinking about it, I believe we should perhaps use Unicode variant of WriteFile and convert current app encoding (which is utf8 by default) to unicode.
In that case, however, you example would work without ToCharset...
Mirek
|
|
|
|
Goto Forum:
Current Time: Fri Apr 26 10:57:09 CEST 2024
Total time taken to generate the page: 1.05318 seconds
|