CompareText and AnsiCompareStr

I ran into a bug that suggested that the sort order of strings differs on different platforms for Delphi and FPC. Impossible to imagine, but the bug seemed pretty clear. So I wrote this test:

procedure test;
var
  s : String;
  i, j : integer;
begin
  s := ' "#$%&''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~';
  for i := 2 to length(s) do
    for j := 1 to i - 1 do
      AssertTrue(AnsiCompareStr(s[j], s[i]) = -1,
      'a '+inttostr(j)+' vs '+inttostr(i) +': '+inttostr(ord(s[j]))+' < '+inttostr(ord(s[i]))+' = '+ s[j]+' < '+s[i]+')');
  s := '"#$%&''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`{|}~';
  for i := 2 to length(s) do
    for j := 1 to i - 1 do
      AssertTrue(CompareText(s[j], s[i]) = -1,
       'c '+ inttostr(j)+' vs '+inttostr(i) +': '+inttostr(ord(s[j]))+' < '+inttostr(ord(s[i]))+' = '+ s[j]+' < '+s[i]+')')
end;

both loops fail for Delphi, though the first one passes for FPC. How can it fail? If the issue is not operator error (how? I copied an ASCII chart), then what’s going on?

Here is ChatGPT4’s suggestions

It is indeed surprising to see that the test fails for Delphi. Based on the test code you provided, it appears that you’re attempting to verify that AnsiCompareStr and CompareText both produce the expected sorting order for a given string of ASCII characters.

However, there are a few potential reasons why the test might fail on Delphi:

  1. Locale-specific behavior: AnsiCompareStr is locale-sensitive, meaning it takes into account the current system locale for sorting. Different locales may have different sorting rules, which could affect the behavior of AnsiCompareStr. This could be a reason why the first loop fails for Delphi, while it passes for FPC, as FPC might not be taking the locale into account in the same way.
  2. Case sensitivity: CompareText is case-insensitive, whereas AnsiCompareStr is case-sensitive. It’s possible that the case-insensitive comparison is causing unexpected results when sorting the string. However, based on the string you provided, it seems unlikely, as the characters are ordered by their ASCII values.

To investigate this issue further, you could try modifying the test to isolate the problematic characters and understand which part of the comparison is causing the test to fail. You could also consider using alternative comparison functions, like AnsiCompareText (case-insensitive) and AnsiStrComp (case-sensitive), to see if they produce the expected results. Additionally, you can check the system locale and try setting it to a neutral one, like “en-US”, to see if the issue is indeed related to locale-specific behavior.

Lastly, it’s also worth verifying that the issue is not caused by a specific Delphi version, by testing the code on different versions or asking others to test it on their systems.

Alright, let’s try something else. Given this program:

program Project1;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils;

procedure injectA(ch : char; var s : String);
var
  i, c : integer;
begin
  for i := 1 to length(s) do
  begin
    c := AnsiCompareStr(ch, s[i]);
    if (c = 0) then
      exit
    else if (c = -1) then
    begin
      s.Insert(i, ch);
      exit;
    end;
  end;
  s := s + ch;
end;

procedure injectC(ch : char; var s : String);
var
  i, c : integer;
begin
  for i := 1 to length(s) do
  begin
    c := CompareText(ch, s[i]);
    if (c = 0) then
      exit
    else if (c = -1) then
    begin
      s.Insert(i, ch);
      exit;
    end;
  end;
  s := s + ch;
end;

function buildA : String;
var
  i : integer;
begin
  result := '';
  for i := 32 to 126 do
    injectA(chr(i), result);
end;

function buildC : String;
var
  i : integer;
begin
  result := '';
  for i := 32 to 126 do
    injectC(chr(i), result);
end;

procedure test;
begin
  writeln('AnsiCompareStr: '''+buildA+'''');
  writeln('CompareText: '''+buildC+'''');
end;

begin
  test;
  writeln('press return to close');
  readln;
end.


I get this output

AnsiCompareStr: ' -'!"#$%&()*+~}|{`_^]\[@?;:/.,0>=<123456789AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
CompareText: ' !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`{|}~'
press return to close

This is windows 11 under parallels on a Mac M1. Note particularly that the - character is out of ANSI order, and that’s the source of the bug that lead me to this

1 Like

Does anyone get any different output from Delphi on any platform?

my locale is English/USA on this windows install, I just discovered. So also, any other locale… I’m particularly interested in the sort order of CompareText

Just asking (because I don’t know the answer) but wouldn’t chr return a wide character (2 bytes) and not an ansi character (single byte)? So Delphi, under the hood, would be converting the character from wide to ansi when AnsiCompareStr and CompareText are called? If so, could this be were the issue lie?

As I said, I don’t know - just asking the question.

Two defects:

  • you are comparing to -1 which is wrong because CompareText returns the difference between 2 non-equal characters thus you have to check for less than zero.
  • Insert from the SysUtils string helper is zero-based you are inserting at the wrong position.

After fixing that we can see that CompareText sorts exactly in ASCII Order treating lower case letters as upper case letters (as we could see from simply reading the code)

AnsiCompareStr which on Windows calls CompareString is a different beast and behaves in a way that you might not expect - see what Raymond Chen has to say about this.

Good point about the return value., but it doesn’t make any difference (I just tested). And the insert is 0 based, which is why the character is inserted at the point of the first character after it

No, I was wrong. duh.

AnsiCompareStr: ''- !"#$%&()*,./:;?@[\]^_`{|}~+<=>0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ'
CompareText: ' !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`{|}~'

As for AnsiCompareStr… obviously I have now learnt that this is the case. After 20 years. But why is it called AnsiCompareStr… I’d feel dumber if it weren’t for that.

1 Like

Because reasons :joy:

Ansi <> ASCII

ANSI encoding is a slightly generic term used to refer to the standard code page on a system, usually Windows. It is more properly referred to as Windows-1252 on Western/U.S. systems. (It can represent certain other Windows code pages on other systems.) This is essentially an extension of the ASCII character set in that it includes all the ASCII characters with an additional 128 character codes. This difference is due to the fact that “ANSI” encoding is 8-bit rather than 7-bit as ASCII is (ASCII is almost always encoded nowadays as 8-bit bytes with the MSB set to 0). See the article for an explanation of why this encoding is usually referred to as ANSI.