Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » U++ Library support » U++ Core » [SOLVED] String.GetCount with umlaut
[SOLVED] String.GetCount with umlaut [message #33707] Tue, 06 September 2011 23:12 Go to next message
forlano is currently offline  forlano
Messages: 1182
Registered: March 2006
Location: Italy
Senior Contributor
Hello,

GetCount() returns 7 for the string "lubäck" Surprised

I see only 6 characters. Where is the trick?
I guess 'ä' is counted twice, but how can I know how many characters are really there in a string?

I am having problem with string aligning of text file in case of accented characters. If even GetCount give uncorrect answer I will not be able to correct the row lenght to be displayed adding additional space (it seems that for each accented characters a space is eated). By the way this was an old issue that I was never able to resolve in my application.

//SetDefaultCharset(CHARSET_UTF8);
String ss, t = "lubäck";
ss << t.GetCount();
SaveFile("out.txt", ss);


Thanks,
Luigi

[Updated on: Thu, 15 September 2011 09:33]

Report message to a moderator

Re: String.GetCount with umlaut [message #33709 is a reply to message #33707] Wed, 07 September 2011 08:20 Go to previous messageGo to next message
forlano is currently offline  forlano
Messages: 1182
Registered: March 2006
Location: Italy
Senior Contributor
forlano wrote on Tue, 06 September 2011 23:12

Hello,

GetCount() returns 7 for the string "lubäck" Surprised

I see only 6 characters. Where is the trick?
I guess 'ä' is counted twice, but how can I know how many characters are really there in a string?

I am having problem with string aligning of text file in case of accented characters. If even GetCount give uncorrect answer I will not be able to correct the row lenght to be displayed adding additional space (it seems that for each accented characters a space is eated). By the way this was an old issue that I was never able to resolve in my application.



Here is what I mean with aligning characters

index.php?t=getfile&id=3434&private=0

I tried to accomodate the names with %-32.32 but the accented characters are counted twice (!) so the visual effect was that of eaten space or shift to left.

Perhaps I must convert the name to some other encoding before to save them Rolling Eyes let's go to try ... Anyway the number of chars I think should be calculated correctly under every encoding.

Luigi

[Updated on: Wed, 07 September 2011 08:36]

Report message to a moderator

Re: String.GetCount with umlaut [message #33710 is a reply to message #33709] Wed, 07 September 2011 08:27 Go to previous messageGo to next message
forlano is currently offline  forlano
Messages: 1182
Registered: March 2006
Location: Italy
Senior Contributor
forlano wrote on Wed, 07 September 2011 08:20


Perhaps I must convert the name to some other encoding before to save them Rolling Eyes let go to try ...



Solved!

out << NFormat(" %-32.32s ", ToCharset(CHARSET_WIN1252, player[i].name, CHARSET_UTF8 ));

work Very Happy
With notepad and wordpad the accent are ok and name aligned (problem solved after 5 years), but within my app (UTF8 encoded) they are disappeared. This last behaviour should be normal.

The new questions are now:
1) which CHARSET_WIN??? should I use for my text file in Windows in case of latin letter to accomodate the maximum number of accents (German, Italian, Danish, French...)?
2) do I need to convert even under Linux to prevent this problem?
I have no experience on this matter.
Luigi

[Updated on: Wed, 07 September 2011 08:35]

Report message to a moderator

Re: String.GetCount with umlaut [message #33713 is a reply to message #33707] Wed, 07 September 2011 09:46 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
forlano wrote on Tue, 06 September 2011 17:12

Hello,

GetCount() returns 7 for the string "lubäck" Surprised

I see only 6 characters. Where is the trick?
I guess 'ä' is counted twice, but how can I know how many characters are really there in a string?



GetCount or GetLength returns a number of _bytes_ in string.

When it is UTF-8 encoded, this is more than number of characters it represents.

Good way to convert is

.ToWString().GetCount()

or you can use "utf8len" function too..

(Anyway, thinking about it, perhaps I shall add "GetCharCount()" method to String...).

Mirek
Re: String.GetCount with umlaut [message #33714 is a reply to message #33707] Wed, 07 September 2011 09:47 Go to previous messageGo to next message
mr_ped is currently offline  mr_ped
Messages: 825
Registered: November 2005
Location: Czech Republic - Praha
Experienced Contributor
UTF8 is multibyte encoding with variable length for different characters.

IMHO the most correct solution for you is to convert from UTF8 to WString (UCS2), which should be well enough to cover *all* latin-like alphabets, Russian's azbuka as well and some more. (I'm afraid UCS2 does not cover all Chinese characters/etc.. no time to google Wink). Then work with WString to count characters, etc..

edit: Mirek beat me...

Quote:

(Anyway, thinking about it, perhaps I shall add "GetCharCount()" method to String...).


Indeed for sure, actually UTF8 would love to have full set of functions like len/right/left/mid/etc.. whenever number of characters (position) is used as parameter. Plus special section in documentation to explain it. Wink Very Happy

[Updated on: Wed, 07 September 2011 09:49]

Report message to a moderator

Re: String.GetCount with umlaut [message #33715 is a reply to message #33713] Wed, 07 September 2011 09:57 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
mirek wrote on Wed, 07 September 2011 03:46


(Anyway, thinking about it, perhaps I shall add "GetCharCount()" method to String...).

Mirek


OK, you can now call GetCharCount() for String Wink

Mirek
Re: String.GetCount with umlaut [message #33716 is a reply to message #33715] Wed, 07 September 2011 10:09 Go to previous messageGo to next message
forlano is currently offline  forlano
Messages: 1182
Registered: March 2006
Location: Italy
Senior Contributor
mirek wrote on Wed, 07 September 2011 09:57



OK, you can now call GetCharCount() for String Wink

Mirek


Great!
Thanks,
Luigi
Re: String.GetCount with umlaut [message #33717 is a reply to message #33715] Wed, 07 September 2011 11:11 Go to previous messageGo to next message
forlano is currently offline  forlano
Messages: 1182
Registered: March 2006
Location: Italy
Senior Contributor
mirek wrote on Wed, 07 September 2011 09:57



OK, you can now call GetCharCount() for String Wink

Mirek


I wonder if
NFormat(" %-40.40s", s)
should accomodate the string by counting the characters or the bytes of the string. From previous example it seems to work with byte and fails the aligning with accented chars that are multibytes.

Luigi

[Updated on: Wed, 07 September 2011 11:11]

Report message to a moderator

Re: String.GetCount with umlaut [message #33721 is a reply to message #33707] Thu, 08 September 2011 09:50 Go to previous messageGo to next message
mr_ped is currently offline  mr_ped
Messages: 825
Registered: November 2005
Location: Czech Republic - Praha
Experienced Contributor
I'm afraid you are opening can of worms right now... Wink
(and it's probably right time to open it, as Mirek is collecting those things for ToDo list)
Re: String.GetCount with umlaut [message #33727 is a reply to message #33721] Thu, 08 September 2011 21:17 Go to previous messageGo to next message
forlano is currently offline  forlano
Messages: 1182
Registered: March 2006
Location: Italy
Senior Contributor
mr_ped wrote on Thu, 08 September 2011 09:50

I'm afraid you are opening can of worms right now... Wink
(and it's probably right time to open it, as Mirek is collecting those things for ToDo list)


Ops... I am sorry, but NFormatChar() would be enough to close the can Smile

Re: String.GetCount with umlaut [message #33732 is a reply to message #33727] Fri, 09 September 2011 10:20 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
forlano wrote on Thu, 08 September 2011 15:17

mr_ped wrote on Thu, 08 September 2011 09:50

I'm afraid you are opening can of worms right now... Wink
(and it's probably right time to open it, as Mirek is collecting those things for ToDo list)


Ops... I am sorry, but NFormatChar() would be enough to close the can Smile




Nah, I guess it should work in all cases, means with normal Format... (Note that NFormat = Format, since about 4 years back Wink

Mirek
Re: String.GetCount with umlaut [message #33735 is a reply to message #33732] Fri, 09 September 2011 12:43 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
Format now supports utf-8 for padding...

Mirek
Re: String.GetCount with umlaut [message #33738 is a reply to message #33735] Fri, 09 September 2011 14:08 Go to previous messageGo to next message
forlano is currently offline  forlano
Messages: 1182
Registered: March 2006
Location: Italy
Senior Contributor
mirek wrote on Fri, 09 September 2011 12:43

Format now supports utf-8 for padding...

Mirek


Very Happy
Re: String.GetCount with umlaut [message #33809 is a reply to message #33738] Thu, 15 September 2011 09:32 Go to previous messageGo to next message
forlano is currently offline  forlano
Messages: 1182
Registered: March 2006
Location: Italy
Senior Contributor
mirek wrote on Fri, 09 September 2011 12:43

Format now supports utf-8 for padding...
Mirek


The new Format() works perfectly and it is a pleasure to see the text aligned full of accented chars (to make me happy I need little things Smile ).
As Mr_Ped said in future would be nice to have some other functions working with chars instead of bytes. One of these function to be put in the top list is Mid(). Sometimes one cut string in the middle of a two bytes character and this produces a mess in qtf viewer. Unfortunately one cannot foresee in advance where the accented char is. In the meanwhile I'll try to convert to WString.

Luigi
Re: String.GetCount with umlaut [message #33810 is a reply to message #33809] Thu, 15 September 2011 11:38 Go to previous message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
forlano wrote on Thu, 15 September 2011 03:32

mirek wrote on Fri, 09 September 2011 12:43

Format now supports utf-8 for padding...
Mirek


The new Format() works perfectly and it is a pleasure to see the text aligned full of accented chars (to make me happy I need little things Smile ).
As Mr_Ped said in future would be nice to have some other functions working with chars instead of bytes. One of these function to be put in the top list is Mid(). Sometimes one cut string in the middle of a two bytes character and this produces a mess in qtf viewer. Unfortunately one cannot foresee in advance where the accented char is. In the meanwhile I'll try to convert to WString.



WString is the preferred way here...

Format was the only thing worth fixing, as it sits in "grey area" between WString and String way...


Mirek
Previous Topic: Array : swapping element with a pointer
Next Topic: Core: Null handling incoherent?
Goto Forum:
  


Current Time: Fri Mar 29 00:31:27 CET 2024

Total time taken to generate the page: 0.01831 seconds