Home » U++ Library support » U++ Core » Sorting of strings is broken
Sorting of strings is broken [message #52005] |
Thu, 04 July 2019 03:46 |
Novo
Messages: 1371 Registered: December 2006
|
Ultimate Contributor |
|
|
Sorting of strings is broken, or I'm missing something ...
static const char* sa[] = {
"1911",
"1911 AL Record vs. opponents",
"1911 AL Record vs. opponents/doc",
"1911 American League Standings",
"1911 American League standings",
"1911 Big 9 football standings",
"1911 Britannica articles needing updates progress",
"1911 College Football Composite All-Southerns",
"1911 College Football Consensus All-Americans",
"1911 Essendon Bombers premiership players",
"1911 Essendon premiership players",
"1911 Havmanden class submarines",
"1911 Helms Foundation NCAA Men's Basketball All-Americans",
"1911 MLB season by team",
"1911 Missouri Valley football standings",
"1911 NL Record vs. opponents",
"1911 NL Record vs. opponents/doc",
"1911 National League Standings",
"1911 National League standings",
"1911 Philadelphia Athletics",
"1911 Princeton Tigers football navbox",
"1911 RMFAC football standings",
"1911 SAIAA football standings",
"1911 SIAA football standings",
"1911 South Australia State Football Team",
"1911 Western Conference football standings",
"1911 college football independents records",
"1911 college football records",
"1911 films",
"1911 in Asian football (AFC)",
"1911 shipwrecks",
"1911-12 Big Nine Conference men's basketball standings",
"1911-12 Big Ten Conference men's basketball standings",
"1911-12 Essendon Bombers dual premiership players",
"1911-12 NHA season by team",
"1911-12 NHA standings",
"1911-12 Western Conference men's basketball standings",
"1911-12 in English football",
"1911-12 in European Football (UEFA)",
"1911-12 in European football (UEFA)",
"1911-12 in Scottish football",
"1911/12 Essendon Bombers dual premiership players",
"1911/12 Essendon dual premiership players",
"1911/doc",
"1911/sandbox",
"1911EB",
"1911s/doc",
"191112 Big Nine Conference men's basketball standings",
"191112 Big Ten Conference men's basketball standings",
"191112 Essendon Bombers dual premiership players",
"191112 NHA season by team",
"191112 NHA standings",
"191112 Western Conference men's basketball standings",
"191112 in English football",
"191112 in European Football (UEFA)",
"191112 in European football (UEFA)",
"191112 in Scottish football",
"1911s",
NULL,
};
CONSOLE_APP_MAIN
{
if (String("1911s/doc") < String("1911s"))
NEVER();
Vector<String> sv;
for (int i = 0; sa[i]; ++i)
sv.Add(sa[i]);
Sort(sv);
RDUMPC(sv);
}
Because 1911s is less than 1911s/doc it should be put before it, IMHO.
Default charset is UTF8.
Regards,
Novo
[Updated on: Thu, 04 July 2019 03:49] Report message to a moderator
|
|
|
|
Re: Sorting of strings is broken [message #52010 is a reply to message #52007] |
Thu, 04 July 2019 15:17 |
Novo
Messages: 1371 Registered: December 2006
|
Ultimate Contributor |
|
|
I'm getting it all the time.
System Ubuntu 19.04 x64.
compiler: clang version 8.0.0-3 (tags/RELEASE_800/final) Target: x86_64-pc-linux-gnu
I discovered this problem when preparing data for another algorithm which requires list of sorted strings as input.
$ env | grep LANG
LANGUAGE=en_US
GDM_LANG=en_US
LANG=en_US.UTF-8
Two strings below are sorted correctly ...
"1911 NL Record vs. opponents",
"1911 NL Record vs. opponents/doc",
-
Attachment: test.zip
(Size: 2.53KB, Downloaded 191 times)
Regards,
Novo
[Updated on: Thu, 04 July 2019 15:53] Report message to a moderator
|
|
|
Re: Sorting of strings is broken [message #52011 is a reply to message #52010] |
Thu, 04 July 2019 16:24 |
|
mirek
Messages: 14039 Registered: November 2005
|
Ultimate Member |
|
|
Thank you, fixed. The bug was lost when posted in [code] - it was utf-8 related. (That said, I am of course quite frightened and ashamed that something like this was in Core for quite some time...)
The problem was that this failed:
String a = "1911s";
String b = "1911-12 in Scottish football";
ASSERT(a < b);
'-' there is not basic ascii, but some higher utf-8 encoded unicode. So it is 3 byte sequence starting with 0xe3, hence it should be sorted after a. However, in one critical place characters were compared signed, which lead in < transitivity failure, which lead to sorting 1911s after 1911s/doc (you can check svn revision for details).
I am sorry about any inconviences this has caused.
(Note: the '-' in code section here is 'normal', I had to fix it because othrewise it gets ignored by the forum).
Mirek
[Updated on: Thu, 04 July 2019 16:25] Report message to a moderator
|
|
|
|
Goto Forum:
Current Time: Fri Sep 20 17:43:44 CEST 2024
Total time taken to generate the page: 0.03962 seconds
|