Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » U++ Library support » U++ Core » Strings with national specific characters are wrongly sorted - Sort
Re: Strings with national specific characters are wrongly sorted - Sort [message #57473 is a reply to message #57468] Fri, 27 August 2021 09:51 Go to previous messageGo to previous message
mirek is currently offline  mirek
Messages: 14267
Registered: November 2005
Ultimate Member
Klugier wrote on Wed, 25 August 2021 13:01
Hello,

Today I found that sort returns wrong results in term of special characters:
#include <Core/Core.h>

using namespace Upp;

CONSOLE_APP_MAIN
{
	Vector<WString> vec = { "Zbig", "Ąć", "Ęc", "Ala", "Edward" };
	Sort(vec);
	
	for (const auto s : vec)
	{
		Cout() << s << "\n";
	}
}


The results are:
Ala
Edward
Zbig
Ąć
Ęc


and should be:
Ala
Ąć
Edward
Ęc
Zbig


This is probably corner case, because this world doesn't exist in Polish, but anyway the error is there. I believe it is more serve when these character are in the middle of the string and we have a lot of such words.

Here is the article about Polish alphabet and the order of letters.

Klugier


This is not error, base [W]String comparison simply compares character values.

You need to use NLS specific sorting in this situation - LanguageInfo::Compare. That said, it really is specifically defined just for CZ and even there it would need improvement, OTOH the generic routine should at least work better that the result you get.

BTW, language specific sorting is extremely difficult topic if it should be done right in many languages...

Mirek

[Updated on: Fri, 27 August 2021 10:18]

Report message to a moderator

 
Read Message
Read Message
Read Message
Read Message
Previous Topic: Probable nasty bug with StringBuffer
Next Topic: POSIX home directory symbol '~' is causing trouble
Goto Forum:
  


Current Time: Mon Aug 25 11:36:36 CEST 2025

Total time taken to generate the page: 0.06877 seconds