U++ forum: Welcome to the forum

Status & Roadmap

Authors & License

Funding Ultimate++

Search on this site

Search in forums

Home » U++ Library support » U++ Libraries and TheIDE: i18n, Unicode and Internationalization » 16 bits wchar

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Return to the default flat view

Create a new topic

Submit Reply

Re: 16 bits wchar [message #11809 is a reply to message #11797]

Wed, 26 September 2007 14:55

sergei is currently offline

sergei
Messages: 94
Registered: September 2007

Member

cbpporter wrote on Wed, 26 September 2007 07:43

sergei wrote on Wed, 26 September 2007 01:56

I didn't mention that I tested basic read/write performance. UTF handling would add overhead to 8 and 16 formats, but not to 32 format. I also remembered the UTF8-EE issue. UTF-32 could solve it easily. IIRC only 21 bits are needed for full unicode, so there's plenty of space to escape to (without overtaking private space).

The only problem with UTF-32 is the storage space. It is 2/4 times the size of UTF-8 and almost always double of UTF-16. And I don't think that UTF-8EE is such a big issue, you just have to make sure to use a more permissive validation scheme. And what is RTL anyway?

Well, 4MB of memory would yield 1 million characters. Do you typically need more, even for a rather complex GUI app? With memory of 512MB/1GB on many computers and 200GB hard drives, I don't think space is a serious issue now. I was more worried about performance - memory allocation and access is somewhat slower (but not always, for 256-8k sizes it's quite good).

The issue isn't UTF-8EE, it's more of a side effect. The main gain is char equals cell. That is, LString (or whatever the name) can always be treated as UTF-32. Unlike WString, which might be 20 wchars or unknown-length UTF-16 string. Even worse with UTF-8, where String length would almost always be different from amount of characters stored. Replace char is a trivial operation in UTF-32, but might require shifting in UTF-8/16 (if the chars require different amounts of space). Search char from end (backwards) - would require to test every find if it's the second/third/fourth char of some sequence. Actually, even simplier - how do you supply a multibyte char to some search/replace function in UTF-16/32? Integer? That would require conversion for every operation.

Unlike currently, when String is either a sequence of chars OR a UTF-8 string, LString would always be a sequence of ints/unsigned ints AND UTF-32 string. String could be left for single-char storing (like data from file or ASCII-only strings), WString for OS interop, and LString could supply conversions to/from both.

Report message to a moderator

Send a private message to this user

[Message index]

		16 bits wchar By: riri on Mon, 05 February 2007 17:19
		Re: 16 bits wchar By: mirek on Mon, 05 February 2007 23:07
		Re: 16 bits wchar By: cbpporter on Tue, 25 September 2007 22:03
		Re: 16 bits wchar By: mirek on Tue, 25 September 2007 23:18
		Re: 16 bits wchar By: cbpporter on Wed, 26 September 2007 07:43
		Re: 16 bits wchar By: mirek on Wed, 26 September 2007 08:48
		Re: 16 bits wchar By: sergei on Wed, 26 September 2007 14:55
		Re: 16 bits wchar By: cbpporter on Wed, 26 September 2007 15:37
		Re: 16 bits wchar By: mirek on Wed, 26 September 2007 22:40
		Re: 16 bits wchar By: cbpporter on Mon, 01 October 2007 13:24
		Re: 16 bits wchar By: mirek on Mon, 01 October 2007 14:28
		Re: 16 bits wchar By: cbpporter on Wed, 03 October 2007 06:16
		Re: 16 bits wchar By: mirek on Wed, 03 October 2007 10:11
		Re: 16 bits wchar By: cbpporter on Wed, 03 October 2007 10:23
		Re: 16 bits wchar By: mirek on Wed, 03 October 2007 10:42
		Re: 16 bits wchar By: mirek on Wed, 03 October 2007 10:26
		Re: 16 bits wchar By: cbpporter on Wed, 03 October 2007 10:36
		Re: 16 bits wchar By: mirek on Wed, 03 October 2007 12:10
		Re: 16 bits wchar By: cbpporter on Wed, 03 October 2007 14:43
		Re: 16 bits wchar By: mirek on Wed, 03 October 2007 21:40
		Re: 16 bits wchar By: cbpporter on Thu, 04 October 2007 13:15
		Re: 16 bits wchar By: mirek on Thu, 04 October 2007 17:33
		Re: 16 bits wchar By: cbpporter on Thu, 04 October 2007 19:49
		Re: 16 bits wchar By: cbpporter on Fri, 12 October 2007 10:25
		Re: 16 bits wchar By: cbpporter on Fri, 12 October 2007 11:27
		Re: 16 bits wchar By: mirek on Fri, 12 October 2007 11:52
		Re: 16 bits wchar By: mirek on Fri, 12 October 2007 11:59
		Re: 16 bits wchar By: cbpporter on Fri, 12 October 2007 13:54
		Re: 16 bits wchar By: cbpporter on Fri, 12 October 2007 16:25
		Re: 16 bits wchar By: mirek on Fri, 12 October 2007 17:03
		Re: 16 bits wchar By: cbpporter on Mon, 15 October 2007 15:01
		Re: 16 bits wchar By: cbpporter on Mon, 15 October 2007 16:49
		Re: 16 bits wchar By: cbpporter on Tue, 16 October 2007 11:13
		Re: 16 bits wchar By: mirek on Sun, 21 October 2007 20:19
		Re: 16 bits wchar By: cbpporter on Sun, 21 October 2007 23:46
		Re: 16 bits wchar By: mirek on Sun, 21 October 2007 23:57
		Re: 16 bits wchar By: cbpporter on Mon, 22 October 2007 09:34
		Re: 16 bits wchar By: mirek on Mon, 22 October 2007 10:47
		Re: 16 bits wchar By: cbpporter on Mon, 22 October 2007 17:57
		Re: 16 bits wchar By: mirek on Mon, 22 October 2007 19:37
		Re: 16 bits wchar By: mirek on Sun, 21 October 2007 20:14
		Re: 16 bits wchar By: sergei on Wed, 26 September 2007 01:56
		Re: 16 bits wchar By: sergei on Wed, 26 September 2007 16:54
		Re: 16 bits wchar By: cbpporter on Wed, 26 September 2007 19:11
		Re: 16 bits wchar By: cbpporter on Wed, 24 October 2007 11:58
		Re: 16 bits wchar By: mirek on Wed, 24 October 2007 13:27
		Re: 16 bits wchar By: cbpporter on Wed, 24 October 2007 14:05
		Re: 16 bits wchar By: cbpporter on Thu, 25 October 2007 14:47
		Re: 16 bits wchar By: mirek on Sat, 27 October 2007 11:11
		Re: 16 bits wchar By: cbpporter on Tue, 06 November 2007 13:31
		Re: 16 bits wchar By: mirek on Fri, 09 November 2007 10:39
		Re: 16 bits wchar By: cbpporter on Sat, 10 November 2007 17:34
		Re: 16 bits wchar By: mirek on Sun, 11 November 2007 18:45
		Re: 16 bits wchar By: cbpporter on Fri, 04 July 2008 17:12
		Re: 16 bits wchar By: cbpporter on Wed, 23 July 2008 15:22
		Re: 16 bits wchar By: mirek on Wed, 23 July 2008 22:04
		Re: 16 bits wchar By: cbpporter on Sat, 02 August 2008 13:27
		Re: 16 bits wchar By: cbpporter on Sat, 02 August 2008 18:34
		Re: 16 bits wchar By: cbpporter on Sat, 02 August 2008 19:01
		Re: 16 bits wchar By: cbpporter on Sun, 03 August 2008 14:51
		Re: 16 bits wchar By: mirek on Mon, 04 August 2008 15:07
		Re: 16 bits wchar By: cbpporter on Mon, 04 August 2008 15:53
		Re: 16 bits wchar By: mirek on Mon, 04 August 2008 17:14
		Re: 16 bits wchar By: cbpporter on Mon, 04 August 2008 22:47
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 00:03
		Re: 16 bits wchar By: cbpporter on Tue, 05 August 2008 00:12
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 00:14
		Re: 16 bits wchar By: cbpporter on Tue, 05 August 2008 00:18
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 00:20
		Re: 16 bits wchar By: cbpporter on Tue, 05 August 2008 00:24
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 00:26
		Re: 16 bits wchar By: cbpporter on Tue, 05 August 2008 00:32
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 00:51
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 10:42
		Re: 16 bits wchar By: cbpporter on Tue, 05 August 2008 12:03
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 15:12
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 15:19
		Re: 16 bits wchar By: cbpporter on Tue, 05 August 2008 15:57
		Re: 16 bits wchar By: cbpporter on Wed, 06 August 2008 13:33
		Re: 16 bits wchar By: cbpporter on Thu, 07 August 2008 08:41
		Re: 16 bits wchar By: mirek on Thu, 07 August 2008 16:10
		Re: 16 bits wchar By: cbpporter on Thu, 07 August 2008 17:33
		Re: 16 bits wchar By: mirek on Thu, 07 August 2008 17:40
		Re: 16 bits wchar By: cbpporter on Thu, 07 August 2008 18:37
		Re: 16 bits wchar By: mirek on Thu, 07 August 2008 20:01
		Re: 16 bits wchar By: cbpporter on Fri, 08 August 2008 13:34
		Re: 16 bits wchar By: mirek on Fri, 08 August 2008 15:32
		Re: 16 bits wchar By: cbpporter on Fri, 08 August 2008 15:47
		Re: 16 bits wchar By: mirek on Fri, 08 August 2008 18:25
		Re: 16 bits wchar By: cbpporter on Sat, 09 August 2008 01:45
		Re: 16 bits wchar By: cbpporter on Fri, 05 September 2008 19:13
		Re: 16 bits wchar By: mirek on Sun, 07 September 2008 13:24
		Re: 16 bits wchar By: mirek on Mon, 04 August 2008 15:03
		Re: 16 bits wchar By: mirek on Sat, 27 October 2007 11:01

Previous Topic:	Arabic words from file
Next Topic:	Not possible to get .t files

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

PDF

]

Current Time: Sun Jul 06 04:56:19 CEST 2025

Total time taken to generate the page: 0.03855 seconds