A UNICODE STRING structure or AnsiString

Please correct me that I make sense of this
a Delphi Ansi String is a C++ UNICODE STRING

referencing
https://learn.microsoft.com/en-us/windows/win32/api/ntdef/ns-ntdef-_unicode_string
it says

typedef struct _UNICODE_STRING {
  USHORT Length;
  USHORT MaximumLength;
  PWSTR  Buffer;
} UNICODE_STRING, *PUNICODE_STRING;

‘‘Now USHORT is a Delphi word’’

A AnsiString I read in Delphi uses Integers to hold sizing of the string
So something is wrong?
What I’m looking for is a record that describes a Delphi AnsiString and even better is the record its self in Delphi *.pas file that I can reference from it.
Does Delphi support an integer containing the size of the AnsiString?
Does Delphi support an integer containing the data space of the AnsiString?
Does Delphi support a pointer to the data space of the AnsiString?
So its a 12 byte record?

and with this record what is the safest way to literally copy the complete record to another location?

Is this what you are looking for?

https://docwiki.embarcadero.com/RADStudio/Athens/en/Internal_Data_Formats_(Delphi)#Long_String_Types

https://docwiki.embarcadero.com/RADStudio/Athens/en/Unicode_in_RAD_Studio#New_String_Type:_UnicodeString

contains

type StrRec = record
CodePage: Word;
ElemSize: Word;
refCount: Integer;
Len: Integer;
case Integer of
1: array[0…0] of AnsiChar;
2: array[0…0] of WideChar;
end;
Now the AnsiChar is one byte and the WideChar is 4 byte long - correct?
Or do I read the ElemSize
or is it easier to do Length(MyStrRec) to get the full length in bytes and get the pointer of MyStrRec to copy the string manually.
And if I did the manual copy in C++ what would the record look like?

no their is so much I’m not getting yet as below I cannot do this in Delphi
2: (boo: Boolean); is fine but 1: (AnsString: AnsiString); is not ok
So I’m looking for a code access method to the AnsString and creating my own record to put in this record. The string can be put into a memory stream for storage in my case.

MyRecord = Record
  case Integer of
    1: (AnsString: AnsiString);
    2: (boo: Boolean);
  1. Widechar is 2 bytes (utf-16, I think)

  2. Why 'tho ?

// In C++ ???
#include <string>
char a[6] = {65,66,67,68,69,0};   // all delphi strings are zero-terminated (I believe)

int main() {
   char* s1 = a;
   std::string s2 = a;

   a[4] = 'Z'; 
   std::cout << s1 << " " << s2 << " \n";

// output : ABCDZ ABCDE 
}

Does the AnsiString come with parameters of string storage memory location pointer and to identify the size of memory used to the zero-termination point?
and then to create that same space in Delphi for Delphi to access it after a CopyMemory something like that?

A couple of references:

https://docwiki.embarcadero.com/RADStudio/Athens/en/String_Types_(Delphi)

This following is worth reading:

https://docwiki.embarcadero.com/Libraries/Athens/en/System.AnsiString

Perhaps also handy
https://www.delphipower.xyz/handbook_2009/the_internal_structure_of_strings.html

A unicode string, also known as string contains UTF16 characters (character is likely not the correct term here).
Mostly 2 bytes per character, but sometimes 4 bytes, using 2 surrogate UTF16 characters.
Slightly more than 1 million different characters/things can be represented.

If you have an ansistring, you can cast it directly to a pansichar.
Similarly with string (unicodestring) to pchar

This is because the ansistring is effectively a pointer to the actual string contents. (except when it is an empty string, in which case it is nil)
Looking at a non empty ansistring, the ansistring points at the actual string information, while the metadata (length, codepage, reference count, etc) is before the Pointer(YourAnsistring).

AnsiString S1
[metadata]S1PointsHere->[this is the text][null]

1 Like

In something like this, you would often have a field above the case statement, that tells you what kind of data the record currently contains, so you can update it/retrieve data from it correctly.

This is created automatically if you use: case myCurrentVarType: Integer of…

You could do something like:

MyRecord = Record
  ansString: AnsiString;
  case myCurrentVarType: Integer of
    1: (aPAnsiChar: PAnsichar);
    2: (boo: Boolean);
End;

Then in your code,

if theRecord.myCurrentVarType = 1 then 
  myStringValue := theRecord.ansString;

Not using the aPAnsiChar field at all.
(But sort of needing it as a placeholder)

thanks for below its got some meat for me


(https://www.delphipower.xyz/handbook_2009/the_internal_structure_of_strings.html)

One thought.

You probably should use string rather than ansistring.
It holds all character codes and doesn’t vary with region.
And there is usually very little overhead copying it from another string, and it can often be used near directly with the windows api’s.

An ansistring needs conversion to and from a string, usually can’t hold all character codes, and the default varies with your regional settings.
utf8string is an ansistring type that is region independant, and does hold all character codes, very good for web stuff.

utf8string I will look into that and may have more sold data on it too