U++ forum: Welcome to the forum

Search on this site

Search in forums

Home » U++ Library support » U++ MT-multithreading and servers » A new function to Web Package Unicode-Escape-Javascript -> Unicode

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

A new function to Web Package Unicode-Escape-Javascript -> Unicode [message #33247]

Wed, 20 July 2011 09:28

sergeynikitin
Messages: 748
Registered: January 2008
Location: Moscow, Russia

Contributor

I propose to include in the package a new feature WEB Unicode-Escape-Javascript -> Unicode.

For international characters in Javascript is used for special encoding non-Latin characters. Looks like: \ u0410 \ u0422 \ u0417 ....
For converting this encoding to Unicode, I needed a new function. I propose that its AEs in the package a new feature WEB Unicode-Escape-Javascript -> Unicode.

For international characters in Javascript is used for special encoding non-Latin characters. Looks like: \ u0410 \ u0422 \ u0417 ....
For converting this encoding to Unicode, I needed a new function. I have not found it, so I wrote in haste.

Maybe someone else will need.

I also want to note that some of the operations inside the function can be optimized. For example the plural multiplying by 16 can be replaced by bit shift.

String Javascript2Unicode(String s) {
	String res ;
	RegExp reg("\\\\u([0-9a-f]{4})",RegExp::MULTILINE);
	
	reg.Clear();
	
	int i_start,i_end,i_old=0;
	while (reg.GlobalMatch(s)) {
		reg.GetMatchPos(0,i_start,i_end);
		res << s.Mid(i_old,i_start-i_old-2);
		wchar wc[2] = {0,0};
		WString ws;
		String m = (String)reg[0];
		wc[0] = StrInt(m.Mid(0,1))*16*16*16
				+StrInt(m.Mid(1,1))*16*16
				+StrInt(m.Mid(2,1))*16
				+StrInt(m.Mid(3,1));
		
		ws = wc;
		String ss = ws.ToString();
		res << ss;
		i_old = i_end;
	}
	return res;
}

PS
Maybe there is already something like that? And I wasted 4 hours? Advise me if you know.

SergeyNikitin<U++>( linux, wine )
{
under( Ubuntu || Debian || Raspbian );
}

Report message to a moderator

Re: A new function to Web Package Unicode-Escape-Javascript -> Unicode [message #33256 is a reply to message #33247]

Wed, 20 July 2011 19:23

sergeynikitin
Messages: 748
Registered: January 2008
Location: Moscow, Russia

Contributor

Sender Ghost wrote PM on Wed, 20 July 2011 19:34

Hello, Sergey.

I used the function with the same behaviour. It is called String UrlDecode(const String& s) from Web package.

The following string

%u0410%u0422%u0417

converts to

���

Unfortunately, there are characters within the string like "\" and "%". Therefore, simple replacement of "\" to "%" will not work. Will either have to have two different functions, or to make an extra parameter to the function UrlDecode.

In any case, thank you!

SergeyNikitin<U++>( linux, wine )
{
under( Ubuntu || Debian || Raspbian );
}

[Updated on: Wed, 20 July 2011 19:24]

Report message to a moderator

Re: A new function to Web Package Unicode-Escape-Javascript -> Unicode [message #33257 is a reply to message #33256]

Wed, 20 July 2011 21:01

sergeynikitin
Messages: 748
Registered: January 2008
Location: Moscow, Russia

Contributor

I propose to extend function UrlDecode with form \uXXXX additionally to %uXXXX enciding. Form \uXXXX is reserved for old browsers and it is standard like form %uXXXX.

Interesting Online Encodings converter:
http://rishida.net/tools/conversion/

pls apply patch to Function:

String UrlDecode(const char *b, const char *e)
{
        StringBuffer out;
        byte d1, d2, d3, d4;
        for(const char *p = b; p < e; p++)
                if(*p == '+')
                        out.Cat(' ');
                else if(*p == '%' && (d1 = ctoi(p[1])) < 16 && (d2 = ctoi(p[2])) < 16) {
                        out.Cat(d1 * 16 + d2);
                        p += 2;
                }
                else if(*p == '%' && (p[1] == 'u' || p[1] == 'U')
                && (d1 = ctoi(p[2])) < 16 && (d2 = ctoi(p[3])) < 16
                && (d3 = ctoi(p[4])) < 16 && (d4 = ctoi(p[5])) < 16) {
                        out.Cat(WString((d1 << 12) | (d2 << 8) | (d3 << 4) | d4, 1).ToString());
                        p += 5;
                }
                else
                        out.Cat(*p);
        return out;
}

I propose change like this:

String UrlDecode(const char *b, const char *e)
{
        StringBuffer out;
        byte d1, d2, d3, d4;
        for(const char *p = b; p < e; p++)
                if(*p == '+')
                        out.Cat(' ');
                else if(*p == '%' && (d1 = ctoi(p[1])) < 16 && (d2 = ctoi(p[2])) < 16) {
                        out.Cat(d1 * 16 + d2);
                        p += 2;
                }
                else if((*p == '%' || *p == '\') && (p[1] == 'u' || p[1] == 'U') // <-This line changed
                && (d1 = ctoi(p[2])) < 16 && (d2 = ctoi(p[3])) < 16
                && (d3 = ctoi(p[4])) < 16 && (d4 = ctoi(p[5])) < 16) {
                        out.Cat(WString((d1 << 12) | (d2 << 8) | (d3 << 4) | d4, 1).ToString());
                        p += 5;
                }
                else
                        out.Cat(*p);
        return out;
}

SergeyNikitin<U++>( linux, wine )
{
under( Ubuntu || Debian || Raspbian );
}

Report message to a moderator

Re: A new function to Web Package Unicode-Escape-Javascript -> Unicode [message #33258 is a reply to message #33257]

Wed, 20 July 2011 21:13

sergeynikitin
Messages: 748
Registered: January 2008
Location: Moscow, Russia

Contributor

This function have name UnEscape in Javascript.

Therefore maybe We must make different functions UrlDecode and UnEscape?
Because logic of Javascript's Unescape a bit different: It don't replace "+" with "space".

SergeyNikitin<U++>( linux, wine )
{
under( Ubuntu || Debian || Raspbian );
}

[Updated on: Wed, 20 July 2011 21:16]

Report message to a moderator

Re: A new function to Web Package Unicode-Escape-Javascript -> Unicode [message #33259 is a reply to message #33258]

Wed, 20 July 2011 22:22

Sender Ghost
Messages: 301
Registered: November 2008

Senior Member

sergeynikitin wrote on Wed, 20 July 2011 21:13

Because logic of Javascript's Unescape a bit different: It don't replace "+" with "space".

Because UrlEncode/UrlDecode functions (also, from the meaning of function names) used for URL(s). I also think, different functions needed for content. May be, with general implementation.

From UrlDecode you could see, as you said, function optimizations and shifts of bits, instead of using regular expressions.

References:
- Some implementation of UrlEncode/UrlDecode.
- About UrlEncode on Wikipedia and why " " (space) converted to "+" instead of "%20" on early stage(s).

[Updated on: Thu, 21 July 2011 05:39]

Report message to a moderator

Previous Topic:	Bug with socket
Next Topic:	SSL server crash

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sun Jul 13 04:44:46 CEST 2025

Total time taken to generate the page: 0.05322 seconds