Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » U++ Library support » U++ Libraries and TheIDE: i18n, Unicode and Internationalization » Making support for code pages unlimited
Re: Making support for code pages unlimited [message #23191 is a reply to message #23190] Thu, 24 September 2009 11:50 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
Anyway, I guess the hard part here is to implement some MbcsToUnicode and MbcsFromUnicode, I can glue it to charset.h easily myself...
Re: Making support for code pages unlimited [message #23193 is a reply to message #23191] Thu, 24 September 2009 22:05 Go to previous messageGo to next message
Mindtraveller is currently offline  Mindtraveller
Messages: 917
Registered: August 2007
Location: Russia, Moscow rgn.
Experienced Contributor

Thank you. Is there a possibility to add support of newly added charsets and MBCS into QTF?
Re: Making support for code pages unlimited [message #23197 is a reply to message #23193] Fri, 25 September 2009 10:28 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
Mindtraveller wrote on Thu, 24 September 2009 16:05

Thank you. Is there a possibility to add support of newly added charsets and MBCS into QTF?


There is always a possibility, but in this particular case I do not see much sense...

After checking QTF code:

Maybe adding something like

int FetchChar(char &* ptr);


- reads multibyte character from source pointer, moves pointer as neede

would be helpful. Or I can change the code a bit more and just use block conversion (from above api).
Re: Making support for code pages unlimited [message #23207 is a reply to message #23197] Fri, 25 September 2009 16:56 Go to previous messageGo to next message
Mindtraveller is currently offline  Mindtraveller
Messages: 917
Registered: August 2007
Location: Russia, Moscow rgn.
Experienced Contributor

OK, Mirek. May be you are right. Then, I consider converting text into UTF-8 and after that into QTF.
Re: Making support for code pages unlimited [message #23233 is a reply to message #23207] Thu, 01 October 2009 23:21 Go to previous messageGo to next message
Mindtraveller is currently offline  Mindtraveller
Messages: 917
Registered: August 2007
Location: Russia, Moscow rgn.
Experienced Contributor

Recently I tried to use newly added charset:
Cout() << ToCharset(CHARSET_CP866, "Простая демка", CHARSET_UTF8);

which should had displayed Russian characters inside Windows' console app (which uses native CP866 for characters). But my output was a number of "error" symbols.
I don't know if it is a bug or I did something wrong. How could I convert my UTF8 string into CP866?

Also I'd like to ask Mirek to give a simple example how could I add MBCS charset into U++.
Re: Making support for code pages unlimited [message #23247 is a reply to message #23233] Sat, 03 October 2009 21:11 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
Mindtraveller wrote on Thu, 01 October 2009 17:21

Recently I tried to use newly added charset:
Cout() << ToCharset(CHARSET_CP866, "Простая демка", CHARSET_UTF8);

which should had displayed Russian characters inside Windows' console app (which uses native CP866 for characters).



Maybe it does not.

IMO the simple way is to create a testcase that stores it into the file, then view the file. If it does not work, you will have a testcase for me:)

Quote:


Also I'd like to ask Mirek to give a simple example how could I add MBCS charset into U++.



I guess we agreed to "make routines to convert To and From Unicode wchar *, len and I will add it to U++".... ?

Mirek
Re: Making support for code pages unlimited [message #25089 is a reply to message #23056] Tue, 09 February 2010 23:45 Go to previous messageGo to next message
Mindtraveller is currently offline  Mindtraveller
Messages: 917
Registered: August 2007
Location: Russia, Moscow rgn.
Experienced Contributor

Recently I had some time to analyze encodings. And I've met a number of issues to be discussed.

First, I discovered that CHARSET_**** tables are not in unicode, but in UTF-8. If it is so, I should rebuild proposed tables according to this.

Second issue is simple etude, which actually failed. Just look at this simple code:
CONSOLE_APP_MAIN
{
	Cout() << ToCharset(CHARSET_CP866, "Всем привет!", CHARSET_UTF8);
}
I executed this example as console app under Windows XP. My "native" console code page under XP is CP866. But on program run, instead of cyrillic letters, I've seen garbage symbols. My question: what is wrong in this example and why do you think it fails to convert symbols properly?

P.S. I tried to update CHARSET_CP866 array to contain UTF-8 encoded symbols, but this little example still fails to convert symbols propely.

[Updated on: Tue, 09 February 2010 23:49]

Report message to a moderator

Re: Making support for code pages unlimited [message #25182 is a reply to message #25089] Sat, 13 February 2010 13:58 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
Mindtraveller wrote on Tue, 09 February 2010 17:45

Recently I had some time to analyze encodings. And I've met a number of issues to be discussed.

First, I discovered that CHARSET_**** tables are not in unicode, but in UTF-8. If it is so, I should rebuild proposed tables according to this.



What makes you think that? Those tables are just UTF-16 codes for characters 128-255.

Quote:


Second issue is simple etude, which actually failed. Just look at this simple code:
CONSOLE_APP_MAIN
{
	Cout() << ToCharset(CHARSET_CP866, "Всем привет!", CHARSET_UTF8);
}
I executed this example as console app under Windows XP. My "native" console code page under XP is CP866. But on program run, instead of cyrillic letters, I've seen garbage symbols. My question: what is wrong in this example and why do you think it fails to convert symbols properly?



Well, it looks like things are more complicated. I believe our part is OK, but the console simply expcts the output to be in different charset. E.g.:

http://illegalargumentexception.blogspot.com/2009/04/i18n-un icode-at-windows-command-prompt.html

However, more thinking about it, I believe we should perhaps use Unicode variant of WriteFile and convert current app encoding (which is utf8 by default) to unicode.

In that case, however, you example would work without ToCharset... Smile

Mirek
Re: Making support for code pages unlimited [message #25207 is a reply to message #25182] Sun, 14 February 2010 01:15 Go to previous message
Mindtraveller is currently offline  Mindtraveller
Messages: 917
Registered: August 2007
Location: Russia, Moscow rgn.
Experienced Contributor

luzr wrote on Sat, 13 February 2010 15:58

I believe our part is OK, but the console simply expcts the output to be in different charset.
I believe it's not. This code
	Cout() << "\n";
	String s = ToCharset(CHARSET_CP866, "Всем привет!", CHARSET_UTF8);
	for (int i=0; i<s.GetLength(); ++i)
		Cout() << Format("%02X ", s[i]);

gives output:
Quote:

1F 1F 1F 1F 20 1F 1F 1F 1F 1F 1F 21

Looks like our problem.
Previous Topic: same LNG_ in lang.cpp
Next Topic: Updated romanian translation
Goto Forum:
  


Current Time: Thu Mar 28 12:41:52 CET 2024

Total time taken to generate the page: 0.02753 seconds