|
|
Home » U++ Library support » U++ Libraries and TheIDE: i18n, Unicode and Internationalization » Making support for code pages unlimited
|
|
Re: Making support for code pages unlimited [message #23061 is a reply to message #23057] |
Sun, 13 September 2009 01:38 |
Mindtraveller
Messages: 917 Registered: August 2007 Location: Russia, Moscow rgn.
|
Experienced Contributor |
|
|
OK, first step is complete.
1. Filtered almost empty charsets and charsets with more than 2-byte unicode per character (seems like currently unsupported by U++ within common CHARSET_* and CHRTAB_* based internal functions).
2. Codepage names are taken from iconv, which is de-facto standard IMO (previous names could be left for backward compatibility).
3. Parsed and collected data into 2 files (2 pieces each), which are release candidates for insertion into Charset.cpp/.h.
Please look at these files. IMO generally they are good but some charsets could be filtered too.
UPD: Sorry, U++ forum fails to add attachments (sometimes more than one, sometimes more than zero). Will attach files ASAP.
[Updated on: Sun, 13 September 2009 08:21] Report message to a moderator
|
|
|
|
Re: Making support for code pages unlimited [message #23078 is a reply to message #23062] |
Tue, 15 September 2009 22:09 |
Mindtraveller
Messages: 917 Registered: August 2007 Location: Russia, Moscow rgn.
|
Experienced Contributor |
|
|
Mirek, could you please look into proposed sources and tell if I need to add anything more to make these codepages added to U++.
(Maybe as some bazaar extension?)
Also I`d like to add standard multibyte code pages for hieroglyphic languages like Japanese, Chinese, etc. Is it possible? I may widen my parser to convert these arrays from iconv sources.
[Updated on: Wed, 16 September 2009 18:00] Report message to a moderator
|
|
|
|
|
|
|
Re: Making support for code pages unlimited [message #23117 is a reply to message #23116] |
Thu, 17 September 2009 19:24 |
|
mirek
Messages: 13975 Registered: November 2005
|
Ultimate Member |
|
|
Mindtraveller wrote on Thu, 17 September 2009 13:13 |
luzr wrote on Thu, 17 September 2009 19:21 |
In reality, it is perhaps not really useful... but obviously your settings are wrong
Mirek
|
What settings do you mean?
|
#ifdef PLATFORM_WIN32
....
AddCharSetE("iso8859-1", CHRTAB_ISO8859_1, CHARSET_WIN1252);
Here the last parameter, CHARSET_WIN1252, says that ISO8859_1 equivalent in Win32 is WIN1252 - it mostly contains same characters, although not at same codepoint.
Well, in fact, I think we can happily remove this info... I will check soon if that is possible.
Quote: |
As I understand, you accepted my addition of charsets, and I want to ask if there is a possibility to add multibyte character sets (like Chinese or Japanese) to make supported character sets list complete. I could parse and add them too, but I don't know how to add them to U++.
|
Well, that will be tricky. I think we will have to change charset.cpp internals a bit to support them.
Also, I am not sure that I want a copy of big CJK conversion tables in each application. Maybe this could be in another package (of course, somehow registering into regular charset.h API).
|
|
|
|
|
|
|
Re: Making support for code pages unlimited [message #23171 is a reply to message #23166] |
Tue, 22 September 2009 09:58 |
Mindtraveller
Messages: 917 Registered: August 2007 Location: Russia, Moscow rgn.
|
Experienced Contributor |
|
|
What is the problem with these tables? How could they possibly lead to crash?
Will you make these tables an additional package as planned?
Also I'd like to ask you to change charset.cpp internals for multibyte charsets support. Then I will port to U++ additional ISO codepages for languages like Japanese or Chinese. This will make U++ support for character pages complete.
These will be handy for those who make really widely used apps with U++.
[Updated on: Tue, 22 September 2009 15:08] Report message to a moderator
|
|
|
Re: Making support for code pages unlimited [message #23184 is a reply to message #23171] |
Thu, 24 September 2009 10:35 |
|
mirek
Messages: 13975 Registered: November 2005
|
Ultimate Member |
|
|
I am sorry I did not have to look into crashing tables more, now I did.
The problem with these crashes was mostly artificial, there were additional check to debug problems in tables:
- check that none of characters in the table is <128 (problem in ARMSCII_8)
- check that there are no duplicates (CP1161).
After removing the check, everything seems to be OK now.
As for multibyte character sets....
There is sort of problem, because some of charset.h expect single character.
So I guess all we can do is some sort of hook into 'whole string' functions that gets extended/reimplemented in "MBCS" package.
Maybe something like:
void RegisterMBCS(byte charset,
WString (*tounicode)(const char *s, int len),
String (*fromunicode)(const wchar *s, int len));
or maybe rather
void RegisterMBCS(byte charset,
WString (*tounicode)(const char *s, int len, int charset),
String (*fromunicode)(const wchar *s, int len, int charset));
or even
void RegisterMBCS(byte charset, void *param,
WString (*tounicode)(const char *s, int len, void *param),
String (*fromunicode)(const wchar *s, int len, void *param));
Mirek
[Updated on: Thu, 24 September 2009 10:36] Report message to a moderator
|
|
|
|
|
|
|
Goto Forum:
Current Time: Fri Apr 19 13:25:49 CEST 2024
Total time taken to generate the page: 0.02365 seconds
|
|
|