U++ forum: Welcome to the forum

Status & Roadmap

Authors & License

Funding Ultimate++

Search on this site

Search in forums

Home » U++ Library support » U++ Libraries and TheIDE: i18n, Unicode and Internationalization » 16 bits wchar

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Return to the default flat view

Create a new topic

Submit Reply

Re: 16 bits wchar [message #11959 is a reply to message #11951]

Thu, 04 October 2007 13:15

cbpporter is currently offline

cbpporter
Messages: 1428
Registered: September 2007

Ultimate Contributor

OK, we should leave than Stream the way you intended. It serves it's purpose well without extra buffers and I don't want 20 variants of Stream and assorted with different kinds of buffers (like in Java).

So I am going to concentrate on CharSet and String. I created a function to check if an UTF-8 sequence is correct or not. I know that you have such a function (I even reused most of it), but we use different versions of Unicode. Mine is compliant (or will be) with changes after November 2003, while yours is older.

I tested it a little and going to try to find some test data so I can fully debug it, but it looks something like this:

bool utf8check5(const char *_s, int len)
{
	const byte *s = (const byte *)_s;
	const byte *lim = s + len;
	int codePoint = 0;
	while(s < lim) {
		word code = (byte)*s++;
		if(code >= 0x80) {
			if(code < 0xC2)
				return false;
			else
			if(code < 0xE0) {
				if(s >= lim || *s < 0x80 || *s >= 0xc0)
					return false;
				codePoint = ((code - 0xC0) << 6) + *s - 0x80;
				if(codePoint < 0x80 || codePoint > 0x07FF)
					return false;
				s++;
			}
			else
			if(code < 0xF0) {
				if(s + 1 >= lim ||
				   s[0] < 0x80 || s[0] >= 0xc0 ||
				   s[1] < 0x80 || s[1] >= 0xc0)
				   	return false;
				codePoint = ((code - 0xE0) << 12) + ((s[0] - 0x80) << 6) + s[1] - 0x80;
				if(codePoint < 0x0800 || codePoint > 0xFFFF)
					return false;
				s += 2;
			}
			else
			if(code < 0xF5) {
				if(s + 2 >= lim ||
				   s[0] < 0x80 || s[0] >= 0xc0 ||
				   s[1] < 0x80 || s[1] >= 0xc0 ||
				   s[2] < 0x80 || s[2] >= 0xc0)
				   	return false;
				codePoint = ((code - 0xf0) << 18) + ((s[0] - 0x80) << 12) + ((s[1] - 0x80) << 6) + s[2] - 0x80;
				if(codePoint < 0x010000 || codePoint > 0x10FFFF)
					return false;
				s += 3;
			}
			else
				return false;
		}
	}
	return true;
}

Report message to a moderator

Send a private message to this user

[Message index]

		16 bits wchar By: riri on Mon, 05 February 2007 17:19
		Re: 16 bits wchar By: mirek on Mon, 05 February 2007 23:07
		Re: 16 bits wchar By: cbpporter on Tue, 25 September 2007 22:03
		Re: 16 bits wchar By: mirek on Tue, 25 September 2007 23:18
		Re: 16 bits wchar By: cbpporter on Wed, 26 September 2007 07:43
		Re: 16 bits wchar By: mirek on Wed, 26 September 2007 08:48
		Re: 16 bits wchar By: sergei on Wed, 26 September 2007 14:55
		Re: 16 bits wchar By: cbpporter on Wed, 26 September 2007 15:37
		Re: 16 bits wchar By: mirek on Wed, 26 September 2007 22:40
		Re: 16 bits wchar By: cbpporter on Mon, 01 October 2007 13:24
		Re: 16 bits wchar By: mirek on Mon, 01 October 2007 14:28
		Re: 16 bits wchar By: cbpporter on Wed, 03 October 2007 06:16
		Re: 16 bits wchar By: mirek on Wed, 03 October 2007 10:11
		Re: 16 bits wchar By: cbpporter on Wed, 03 October 2007 10:23
		Re: 16 bits wchar By: mirek on Wed, 03 October 2007 10:42
		Re: 16 bits wchar By: mirek on Wed, 03 October 2007 10:26
		Re: 16 bits wchar By: cbpporter on Wed, 03 October 2007 10:36
		Re: 16 bits wchar By: mirek on Wed, 03 October 2007 12:10
		Re: 16 bits wchar By: cbpporter on Wed, 03 October 2007 14:43
		Re: 16 bits wchar By: mirek on Wed, 03 October 2007 21:40
		Re: 16 bits wchar By: cbpporter on Thu, 04 October 2007 13:15
		Re: 16 bits wchar By: mirek on Thu, 04 October 2007 17:33
		Re: 16 bits wchar By: cbpporter on Thu, 04 October 2007 19:49
		Re: 16 bits wchar By: cbpporter on Fri, 12 October 2007 10:25
		Re: 16 bits wchar By: cbpporter on Fri, 12 October 2007 11:27
		Re: 16 bits wchar By: mirek on Fri, 12 October 2007 11:52
		Re: 16 bits wchar By: mirek on Fri, 12 October 2007 11:59
		Re: 16 bits wchar By: cbpporter on Fri, 12 October 2007 13:54
		Re: 16 bits wchar By: cbpporter on Fri, 12 October 2007 16:25
		Re: 16 bits wchar By: mirek on Fri, 12 October 2007 17:03
		Re: 16 bits wchar By: cbpporter on Mon, 15 October 2007 15:01
		Re: 16 bits wchar By: cbpporter on Mon, 15 October 2007 16:49
		Re: 16 bits wchar By: cbpporter on Tue, 16 October 2007 11:13
		Re: 16 bits wchar By: mirek on Sun, 21 October 2007 20:19
		Re: 16 bits wchar By: cbpporter on Sun, 21 October 2007 23:46
		Re: 16 bits wchar By: mirek on Sun, 21 October 2007 23:57
		Re: 16 bits wchar By: cbpporter on Mon, 22 October 2007 09:34
		Re: 16 bits wchar By: mirek on Mon, 22 October 2007 10:47
		Re: 16 bits wchar By: cbpporter on Mon, 22 October 2007 17:57
		Re: 16 bits wchar By: mirek on Mon, 22 October 2007 19:37
		Re: 16 bits wchar By: mirek on Sun, 21 October 2007 20:14
		Re: 16 bits wchar By: sergei on Wed, 26 September 2007 01:56
		Re: 16 bits wchar By: sergei on Wed, 26 September 2007 16:54
		Re: 16 bits wchar By: cbpporter on Wed, 26 September 2007 19:11
		Re: 16 bits wchar By: cbpporter on Wed, 24 October 2007 11:58
		Re: 16 bits wchar By: mirek on Wed, 24 October 2007 13:27
		Re: 16 bits wchar By: cbpporter on Wed, 24 October 2007 14:05
		Re: 16 bits wchar By: cbpporter on Thu, 25 October 2007 14:47
		Re: 16 bits wchar By: mirek on Sat, 27 October 2007 11:11
		Re: 16 bits wchar By: cbpporter on Tue, 06 November 2007 13:31
		Re: 16 bits wchar By: mirek on Fri, 09 November 2007 10:39
		Re: 16 bits wchar By: cbpporter on Sat, 10 November 2007 17:34
		Re: 16 bits wchar By: mirek on Sun, 11 November 2007 18:45
		Re: 16 bits wchar By: cbpporter on Fri, 04 July 2008 17:12
		Re: 16 bits wchar By: cbpporter on Wed, 23 July 2008 15:22
		Re: 16 bits wchar By: mirek on Wed, 23 July 2008 22:04
		Re: 16 bits wchar By: cbpporter on Sat, 02 August 2008 13:27
		Re: 16 bits wchar By: cbpporter on Sat, 02 August 2008 18:34
		Re: 16 bits wchar By: cbpporter on Sat, 02 August 2008 19:01
		Re: 16 bits wchar By: cbpporter on Sun, 03 August 2008 14:51
		Re: 16 bits wchar By: mirek on Mon, 04 August 2008 15:07
		Re: 16 bits wchar By: cbpporter on Mon, 04 August 2008 15:53
		Re: 16 bits wchar By: mirek on Mon, 04 August 2008 17:14
		Re: 16 bits wchar By: cbpporter on Mon, 04 August 2008 22:47
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 00:03
		Re: 16 bits wchar By: cbpporter on Tue, 05 August 2008 00:12
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 00:14
		Re: 16 bits wchar By: cbpporter on Tue, 05 August 2008 00:18
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 00:20
		Re: 16 bits wchar By: cbpporter on Tue, 05 August 2008 00:24
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 00:26
		Re: 16 bits wchar By: cbpporter on Tue, 05 August 2008 00:32
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 00:51
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 10:42
		Re: 16 bits wchar By: cbpporter on Tue, 05 August 2008 12:03
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 15:12
		Re: 16 bits wchar By: mirek on Tue, 05 August 2008 15:19
		Re: 16 bits wchar By: cbpporter on Tue, 05 August 2008 15:57
		Re: 16 bits wchar By: cbpporter on Wed, 06 August 2008 13:33
		Re: 16 bits wchar By: cbpporter on Thu, 07 August 2008 08:41
		Re: 16 bits wchar By: mirek on Thu, 07 August 2008 16:10
		Re: 16 bits wchar By: cbpporter on Thu, 07 August 2008 17:33
		Re: 16 bits wchar By: mirek on Thu, 07 August 2008 17:40
		Re: 16 bits wchar By: cbpporter on Thu, 07 August 2008 18:37
		Re: 16 bits wchar By: mirek on Thu, 07 August 2008 20:01
		Re: 16 bits wchar By: cbpporter on Fri, 08 August 2008 13:34
		Re: 16 bits wchar By: mirek on Fri, 08 August 2008 15:32
		Re: 16 bits wchar By: cbpporter on Fri, 08 August 2008 15:47
		Re: 16 bits wchar By: mirek on Fri, 08 August 2008 18:25
		Re: 16 bits wchar By: cbpporter on Sat, 09 August 2008 01:45
		Re: 16 bits wchar By: cbpporter on Fri, 05 September 2008 19:13
		Re: 16 bits wchar By: mirek on Sun, 07 September 2008 13:24
		Re: 16 bits wchar By: mirek on Mon, 04 August 2008 15:03
		Re: 16 bits wchar By: mirek on Sat, 27 October 2007 11:01

Previous Topic:	Arabic words from file
Next Topic:	Not possible to get .t files

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

PDF

]

Current Time: Thu Mar 05 04:32:05 CET 2026

Total time taken to generate the page: 0.11892 seconds