Home » U++ Library support » U++ Core » Strings with national specific characters are wrongly sorted - Sort
Strings with national specific characters are wrongly sorted - Sort [message #57468] |
Wed, 25 August 2021 13:01  |
 |
Klugier
Messages: 1099 Registered: September 2012 Location: Poland, Kraków
|
Senior Contributor |
|
|
Hello,
Today I found that sort returns wrong results in term of special characters:
#include <Core/Core.h>
using namespace Upp;
CONSOLE_APP_MAIN
{
Vector<WString> vec = { "Zbig", "Ąć", "Ęc", "Ala", "Edward" };
Sort(vec);
for (const auto s : vec)
{
Cout() << s << "\n";
}
}
The results are:
and should be:
This is probably corner case, because this world doesn't exist in Polish, but anyway the error is there. I believe it is more serve when these character are in the middle of the string and we have a lot of such words.
Here is the article about Polish alphabet and the order of letters.
Klugier
U++ - one framework to rule them all.
[Updated on: Wed, 25 August 2021 17:49] Report message to a moderator
|
|
|
|
Re: Strings with national specific characters are wrongly sorted - Sort [message #57473 is a reply to message #57468] |
Fri, 27 August 2021 09:51   |
 |
mirek
Messages: 14255 Registered: November 2005
|
Ultimate Member |
|
|
Klugier wrote on Wed, 25 August 2021 13:01Hello,
Today I found that sort returns wrong results in term of special characters:
#include <Core/Core.h>
using namespace Upp;
CONSOLE_APP_MAIN
{
Vector<WString> vec = { "Zbig", "Ąć", "Ęc", "Ala", "Edward" };
Sort(vec);
for (const auto s : vec)
{
Cout() << s << "\n";
}
}
The results are:
and should be:
This is probably corner case, because this world doesn't exist in Polish, but anyway the error is there. I believe it is more serve when these character are in the middle of the string and we have a lot of such words.
Here is the article about Polish alphabet and the order of letters.
Klugier
This is not error, base [W]String comparison simply compares character values.
You need to use NLS specific sorting in this situation - LanguageInfo::Compare. That said, it really is specifically defined just for CZ and even there it would need improvement, OTOH the generic routine should at least work better that the result you get.
BTW, language specific sorting is extremely difficult topic if it should be done right in many languages...
Mirek
[Updated on: Fri, 27 August 2021 10:18] Report message to a moderator
|
|
|
|
Goto Forum:
Current Time: Fri Apr 25 12:16:51 CEST 2025
Total time taken to generate the page: 0.01246 seconds
|