I ran into a bug that suggested that the sort order of strings differs on different platforms for Delphi and FPC. Impossible to imagine, but the bug seemed pretty clear. So I wrote this test:
procedure test;
var
s : String;
i, j : integer;
begin
s := ' "#$%&''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~';
for i := 2 to length(s) do
for j := 1 to i - 1 do
AssertTrue(AnsiCompareStr(s[j], s[i]) = -1,
'a '+inttostr(j)+' vs '+inttostr(i) +': '+inttostr(ord(s[j]))+' < '+inttostr(ord(s[i]))+' = '+ s[j]+' < '+s[i]+')');
s := '"#$%&''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`{|}~';
for i := 2 to length(s) do
for j := 1 to i - 1 do
AssertTrue(CompareText(s[j], s[i]) = -1,
'c '+ inttostr(j)+' vs '+inttostr(i) +': '+inttostr(ord(s[j]))+' < '+inttostr(ord(s[i]))+' = '+ s[j]+' < '+s[i]+')')
end;
both loops fail for Delphi, though the first one passes for FPC. How can it fail? If the issue is not operator error (how? I copied an ASCII chart), then what’s going on?
It is indeed surprising to see that the test fails for Delphi. Based on the test code you provided, it appears that you’re attempting to verify that AnsiCompareStr and CompareText both produce the expected sorting order for a given string of ASCII characters.
However, there are a few potential reasons why the test might fail on Delphi:
Locale-specific behavior: AnsiCompareStr is locale-sensitive, meaning it takes into account the current system locale for sorting. Different locales may have different sorting rules, which could affect the behavior of AnsiCompareStr. This could be a reason why the first loop fails for Delphi, while it passes for FPC, as FPC might not be taking the locale into account in the same way.
Case sensitivity: CompareText is case-insensitive, whereas AnsiCompareStr is case-sensitive. It’s possible that the case-insensitive comparison is causing unexpected results when sorting the string. However, based on the string you provided, it seems unlikely, as the characters are ordered by their ASCII values.
To investigate this issue further, you could try modifying the test to isolate the problematic characters and understand which part of the comparison is causing the test to fail. You could also consider using alternative comparison functions, like AnsiCompareText (case-insensitive) and AnsiStrComp (case-sensitive), to see if they produce the expected results. Additionally, you can check the system locale and try setting it to a neutral one, like “en-US”, to see if the issue is indeed related to locale-specific behavior.
Lastly, it’s also worth verifying that the issue is not caused by a specific Delphi version, by testing the code on different versions or asking others to test it on their systems.
Alright, let’s try something else. Given this program:
program Project1;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils;
procedure injectA(ch : char; var s : String);
var
i, c : integer;
begin
for i := 1 to length(s) do
begin
c := AnsiCompareStr(ch, s[i]);
if (c = 0) then
exit
else if (c = -1) then
begin
s.Insert(i, ch);
exit;
end;
end;
s := s + ch;
end;
procedure injectC(ch : char; var s : String);
var
i, c : integer;
begin
for i := 1 to length(s) do
begin
c := CompareText(ch, s[i]);
if (c = 0) then
exit
else if (c = -1) then
begin
s.Insert(i, ch);
exit;
end;
end;
s := s + ch;
end;
function buildA : String;
var
i : integer;
begin
result := '';
for i := 32 to 126 do
injectA(chr(i), result);
end;
function buildC : String;
var
i : integer;
begin
result := '';
for i := 32 to 126 do
injectC(chr(i), result);
end;
procedure test;
begin
writeln('AnsiCompareStr: '''+buildA+'''');
writeln('CompareText: '''+buildC+'''');
end;
begin
test;
writeln('press return to close');
readln;
end.
I get this output
AnsiCompareStr: ' -'!"#$%&()*+~}|{`_^]\[@?;:/.,0>=<123456789AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
CompareText: ' !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`{|}~'
press return to close
This is windows 11 under parallels on a Mac M1. Note particularly that the - character is out of ANSI order, and that’s the source of the bug that lead me to this
my locale is English/USA on this windows install, I just discovered. So also, any other locale… I’m particularly interested in the sort order of CompareText
Just asking (because I don’t know the answer) but wouldn’t chr return a wide character (2 bytes) and not an ansi character (single byte)? So Delphi, under the hood, would be converting the character from wide to ansi when AnsiCompareStr and CompareText are called? If so, could this be were the issue lie?
As I said, I don’t know - just asking the question.
you are comparing to -1 which is wrong because CompareText returns the difference between 2 non-equal characters thus you have to check for less than zero.
Insert from the SysUtils string helper is zero-based you are inserting at the wrong position.
After fixing that we can see that CompareText sorts exactly in ASCII Order treating lower case letters as upper case letters (as we could see from simply reading the code)
AnsiCompareStr which on Windows calls CompareString is a different beast and behaves in a way that you might not expect - see what Raymond Chen has to say about this.
Good point about the return value., but it doesn’t make any difference (I just tested). And the insert is 0 based, which is why the character is inserted at the point of the first character after it
As for AnsiCompareStr… obviously I have now learnt that this is the case. After 20 years. But why is it called AnsiCompareStr… I’d feel dumber if it weren’t for that.
ANSI encoding is a slightly generic term used to refer to the standard code page on a system, usually Windows. It is more properly referred to as Windows-1252 on Western/U.S. systems. (It can represent certain other Windows code pages on other systems.) This is essentially an extension of the ASCII character set in that it includes all the ASCII characters with an additional 128 character codes. This difference is due to the fact that “ANSI” encoding is 8-bit rather than 7-bit as ASCII is (ASCII is almost always encoded nowadays as 8-bit bytes with the MSB set to 0). See the article for an explanation of why this encoding is usually referred to as ANSI.