Home » U++ Library support » U++ Libraries and TheIDE: i18n, Unicode and Internationalization » 16 bits wchar
Re: 16 bits wchar [message #11785 is a reply to message #8059] |
Tue, 25 September 2007 22:03   |
cbpporter
Messages: 1427 Registered: September 2007
|
Ultimate Contributor |
|
|
I was quite unhappy when I found out that U++ is not Unicode standard compliant with it's "UTF-16" (what it implements is actually UCS-2). There are o lot of programs with poor Unicode support, which is partyially because STL doesn't support full Unicode either.
In theory it would be quite unforgivable for an application to handle just a subset of the standard. But how does the situation look in theory?
To answer this question I did a number of relatively thorough tests which took me about two hours. I used my computer at work, which has a Windows XP SP2 operating system. The first part was to determine if the OS supports surrogate pairs. After some testing (and research) I found that surrogate pairs can be enabled easily and are enabled by default. Windows has no problems theoretically to use the kind of characters (but individual pieces of software can). Next I found a font which displays about 20000 characters with codes about 0xFFFF, I installed them, and surprise surprise, it worked.
Next I tested a couple of applications. At first I wanted to give exact results, but I found it boring to write them and concluded that you will find it boring to read them. In short, Notepad and WordPad both display correctly and identify two code-units as one code point. Opera doesn’t identify code points correctly in some files and it can no do copy operations (it will truncate only to the lower 16 bits). Internet Explorer works correctly, but it couldn't use the correct registry entries to display the characters, so it used a little black rectangle. And the viewer from Total Commander is really ill equipped for these kinds of tasks.
Next I would like to test U++, but I get strange results when trying to find the length of a string when using only normal characters (with codes below 0xFFFF).
I took one of the examples and slightly modified it:
#include <Core/Core.h>
using namespace Upp;
CONSOLE_APP_MAIN
{
SetDefaultCharset(CHARSET_UTF8);
WString x = "";
for(int i = 280; i < 300; i++)
x.Cat(i);
DUMP(x);
DUMP(x.GetLength());
DUMP(x.GetCount());
String y = x.ToString();
DUMP(y);
DUMP(y.GetLength());
DUMP(y.GetCount());
y.Cat(" (appended)");
x = y.ToWString();
DUMP(x);
DUMP(x.GetLength());
DUMP(x.GetCount());
}
I got these results:
x = ─ÿ─Ö─Ü─¢─£─¥─₧─ƒ─á─í─ó─ú─ñ─Ñ─ª─º─¿─⌐──½
x.GetLength() = 20
x.GetCount() = 20
y = ─ÿ─Ö─Ü─¢─£─¥─₧─ƒ─á─í─ó─ú─ñ─Ñ─ª─º─¿─⌐──½
y.GetLength() = 40
y.GetCount() = 40
x = ─ÿ─Ö─Ü─¢─£─¥─₧─ƒ─á─í─ó─ú─ñ─Ñ─ª─º─¿─⌐──½ (appended)
x.GetLength() = 31
x.GetCount() = 31
Except the fact that the cars are mangled, the lengths doesn't seem to be ok. I may have understood incorrectly, but AFAIK GetLength should return the length in code units and GgtCount the number of real characters, so code points.
I also started researching the exact encoding methods of UTF and I will add full Unicode support to strings. It will be for personal use, but if anybody is interested I will post my results. Right now I'm trying to find and efficient way to index multichar strings. I think I will have to use iterators instead.
|
|
|
 |
|
16 bits wchar
By: riri on Mon, 05 February 2007 17:19
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 05 February 2007 23:07
|
 |
|
Re: 16 bits wchar
By: cbpporter on Tue, 25 September 2007 22:03
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 25 September 2007 23:18
|
 |
|
Re: 16 bits wchar
By: cbpporter on Wed, 26 September 2007 07:43
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 26 September 2007 08:48
|
 |
|
Re: 16 bits wchar
By: sergei on Wed, 26 September 2007 14:55
|
 |
|
Re: 16 bits wchar
By: cbpporter on Wed, 26 September 2007 15:37
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 26 September 2007 22:40
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 01 October 2007 14:28
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 10:11
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 10:42
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 10:26
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 12:10
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 03 October 2007 21:40
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Thu, 04 October 2007 17:33
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Fri, 12 October 2007 11:52
|
 |
|
Re: 16 bits wchar
By: mirek on Fri, 12 October 2007 11:59
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Fri, 12 October 2007 17:03
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Sun, 21 October 2007 20:19
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Sun, 21 October 2007 23:57
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 22 October 2007 10:47
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 22 October 2007 19:37
|
 |
|
Re: 16 bits wchar
By: mirek on Sun, 21 October 2007 20:14
|
 |
|
Re: 16 bits wchar
By: sergei on Wed, 26 September 2007 01:56
|
 |
|
Re: 16 bits wchar
By: sergei on Wed, 26 September 2007 16:54
|
 |
|
Re: 16 bits wchar
By: cbpporter on Wed, 26 September 2007 19:11
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 24 October 2007 13:27
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Sat, 27 October 2007 11:11
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Fri, 09 November 2007 10:39
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Sun, 11 November 2007 18:45
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Wed, 23 July 2008 22:04
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 04 August 2008 15:07
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 04 August 2008 17:14
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:03
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:14
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:20
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:26
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 00:51
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 10:42
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 15:12
|
 |
|
Re: 16 bits wchar
By: mirek on Tue, 05 August 2008 15:19
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Thu, 07 August 2008 16:10
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Thu, 07 August 2008 17:40
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Thu, 07 August 2008 20:01
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Fri, 08 August 2008 15:32
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: mirek on Fri, 08 August 2008 18:25
|
 |
|
Re: 16 bits wchar
|
 |
|
Re: 16 bits wchar
By: cbpporter on Fri, 05 September 2008 19:13
|
 |
|
Re: 16 bits wchar
By: mirek on Sun, 07 September 2008 13:24
|
 |
|
Re: 16 bits wchar
By: mirek on Mon, 04 August 2008 15:03
|
 |
|
Re: 16 bits wchar
By: mirek on Sat, 27 October 2007 11:01
|
Goto Forum:
Current Time: Fri Jul 04 07:43:52 CEST 2025
Total time taken to generate the page: 0.04973 seconds
|