Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » U++ Library support » U++ MT-multithreading and servers » A new function to Web Package Unicode-Escape-Javascript -> Unicode
A new function to Web Package Unicode-Escape-Javascript -> Unicode [message #33247] Wed, 20 July 2011 09:28 Go to next message
sergeynikitin is currently offline  sergeynikitin
Messages: 748
Registered: January 2008
Location: Moscow, Russia
Contributor

I propose to include in the package a new feature WEB Unicode-Escape-Javascript -> Unicode.

For international characters in Javascript is used for special encoding non-Latin characters. Looks like: \ u0410 \ u0422 \ u0417 ....
For converting this encoding to Unicode, I needed a new function. I propose that its AEs in the package a new feature WEB Unicode-Escape-Javascript -> Unicode.

For international characters in Javascript is used for special encoding non-Latin characters. Looks like: \ u0410 \ u0422 \ u0417 ....
For converting this encoding to Unicode, I needed a new function. I have not found it, so I wrote in haste.

Maybe someone else will need.

I also want to note that some of the operations inside the function can be optimized. For example the plural multiplying by 16 can be replaced by bit shift.

String Javascript2Unicode(String s) {
	String res ;
	RegExp reg("\\\\u([0-9a-f]{4})",RegExp::MULTILINE);
	
	reg.Clear();
	
	int i_start,i_end,i_old=0;
	while (reg.GlobalMatch(s)) {
		reg.GetMatchPos(0,i_start,i_end);
		res << s.Mid(i_old,i_start-i_old-2);
		wchar wc[2] = {0,0};
		WString ws;
		String m = (String)reg[0];
		wc[0] = StrInt(m.Mid(0,1))*16*16*16
				+StrInt(m.Mid(1,1))*16*16
				+StrInt(m.Mid(2,1))*16
				+StrInt(m.Mid(3,1));
		
		ws = wc;
		String ss = ws.ToString();
		res << ss;
		i_old = i_end;
	}
	return res;
}


PS
Maybe there is already something like that? And I wasted 4 hours? Advise me if you know.


SergeyNikitin<U++>( linux, wine )
{
    under( Ubuntu || Debian || Raspbian );
}
Re: A new function to Web Package Unicode-Escape-Javascript -> Unicode [message #33256 is a reply to message #33247] Wed, 20 July 2011 19:23 Go to previous messageGo to next message
sergeynikitin is currently offline  sergeynikitin
Messages: 748
Registered: January 2008
Location: Moscow, Russia
Contributor

Sender Ghost wrote PM on Wed, 20 July 2011 19:34

Hello, Sergey.

I used the function with the same behaviour. It is called String UrlDecode(const String& s) from Web package.

The following string
%u0410%u0422%u0417

converts to
ÀÒÇ



Unfortunately, there are characters within the string like "\" and "%". Therefore, simple replacement of "\" to "%" will not work. Will either have to have two different functions, or to make an extra parameter to the function UrlDecode.

In any case, thank you!


SergeyNikitin<U++>( linux, wine )
{
    under( Ubuntu || Debian || Raspbian );
}

[Updated on: Wed, 20 July 2011 19:24]

Report message to a moderator

Re: A new function to Web Package Unicode-Escape-Javascript -> Unicode [message #33257 is a reply to message #33256] Wed, 20 July 2011 21:01 Go to previous messageGo to next message
sergeynikitin is currently offline  sergeynikitin
Messages: 748
Registered: January 2008
Location: Moscow, Russia
Contributor

I propose to extend function UrlDecode with form \uXXXX additionally to %uXXXX enciding. Form \uXXXX is reserved for old browsers and it is standard like form %uXXXX.

Interesting Online Encodings converter:
http://rishida.net/tools/conversion/

pls apply patch to Function:
String UrlDecode(const char *b, const char *e)
{
        StringBuffer out;
        byte d1, d2, d3, d4;
        for(const char *p = b; p < e; p++)
                if(*p == '+')
                        out.Cat(' ');
                else if(*p == '%' && (d1 = ctoi(p[1])) < 16 && (d2 = ctoi(p[2])) < 16) {
                        out.Cat(d1 * 16 + d2);
                        p += 2;
                }
                else if(*p == '%' && (p[1] == 'u' || p[1] == 'U')
                && (d1 = ctoi(p[2])) < 16 && (d2 = ctoi(p[3])) < 16
                && (d3 = ctoi(p[4])) < 16 && (d4 = ctoi(p[5])) < 16) {
                        out.Cat(WString((d1 << 12) | (d2 << 8) | (d3 << 4) | d4, 1).ToString());
                        p += 5;
                }
                else
                        out.Cat(*p);
        return out;
}

I propose change like this:
String UrlDecode(const char *b, const char *e)
{
        StringBuffer out;
        byte d1, d2, d3, d4;
        for(const char *p = b; p < e; p++)
                if(*p == '+')
                        out.Cat(' ');
                else if(*p == '%' && (d1 = ctoi(p[1])) < 16 && (d2 = ctoi(p[2])) < 16) {
                        out.Cat(d1 * 16 + d2);
                        p += 2;
                }
                else if((*p == '%' || *p == '\') && (p[1] == 'u' || p[1] == 'U') // <-This line changed
                && (d1 = ctoi(p[2])) < 16 && (d2 = ctoi(p[3])) < 16
                && (d3 = ctoi(p[4])) < 16 && (d4 = ctoi(p[5])) < 16) {
                        out.Cat(WString((d1 << 12) | (d2 << 8) | (d3 << 4) | d4, 1).ToString());
                        p += 5;
                }
                else
                        out.Cat(*p);
        return out;
}


SergeyNikitin<U++>( linux, wine )
{
    under( Ubuntu || Debian || Raspbian );
}
Re: A new function to Web Package Unicode-Escape-Javascript -> Unicode [message #33258 is a reply to message #33257] Wed, 20 July 2011 21:13 Go to previous messageGo to next message
sergeynikitin is currently offline  sergeynikitin
Messages: 748
Registered: January 2008
Location: Moscow, Russia
Contributor

This function have name UnEscape in Javascript.

Therefore maybe We must make different functions UrlDecode and UnEscape?
Because logic of Javascript's Unescape a bit different: It don't replace "+" with "space".


SergeyNikitin<U++>( linux, wine )
{
    under( Ubuntu || Debian || Raspbian );
}

[Updated on: Wed, 20 July 2011 21:16]

Report message to a moderator

Re: A new function to Web Package Unicode-Escape-Javascript -> Unicode [message #33259 is a reply to message #33258] Wed, 20 July 2011 22:22 Go to previous message
Sender Ghost is currently offline  Sender Ghost
Messages: 301
Registered: November 2008
Senior Member
sergeynikitin wrote on Wed, 20 July 2011 21:13


Because logic of Javascript's Unescape a bit different: It don't replace "+" with "space".

Because UrlEncode/UrlDecode functions (also, from the meaning of function names) used for URL(s). I also think, different functions needed for content. May be, with general implementation.

From UrlDecode you could see, as you said, function optimizations and shifts of bits, instead of using regular expressions.

References:
- Some implementation of UrlEncode/UrlDecode.
- About UrlEncode on Wikipedia and why " " (space) converted to "+" instead of "%20" on early stage(s).

[Updated on: Thu, 21 July 2011 05:39]

Report message to a moderator

Previous Topic: Bug with socket
Next Topic: SSL server crash
Goto Forum:
  


Current Time: Fri Mar 29 06:03:28 CET 2024

Total time taken to generate the page: 0.01378 seconds