I just did a bug fix on our XML exporter to handle illegal characters - my initial research determined that ampersands , greater thans, less thans and double quotes have to be dealt with before adding them to a string in an XML line.
I successfully implemented this only to discover that a single quote is also an illegal XML character.
How do you deal with the double whammy of Delphi and XML to deal with this?
// for XML
// & = &
// < = <
// > = >
// " = "
// ’ = ’ // oh no
// cIllegalXmlChars = ‘&<>"’;
function IsValidXMLChar(wc: WideChar): Boolean;
begin
case Word(wc) of
$0009, $000A, $000C, $000D,
$0020..$D7FF,
$E000..$FFFD, // Standard Unicode chars below $FFFF
$D800..$DBFF, // High surrogate of Unicode character = $10000 - $10FFFF
$DC00..$DFFF: // Low surrogate of Unicode character = $10000 - $10FFFF
result := True;
else
result := False;
end;
end;
function StripInvalidXML(const s: string): string;
var
i, count: Integer;
begin
count := Length(s);
setLength(result, count);
for i := 1 to Count do // Iterate
begin
if IsValidXMLChar(WideChar(s[i])) then
result[i] := s[i]
else
result[i] := ' ';
end; // for}
end;
function EscapeForXML(const value: string; const isAttribute: boolean = True; const isCDATASection : Boolean = False): string;
begin
result := StripInvalidXML(value);
if isCDATASection then
begin
Result := StringReplace(Result, ']]>', ']>',[rfReplaceAll]);
exit;
end;
//note we are avoiding replacing & with &amp; !!
Result := StringReplace(result, '&', '[[-xy-amp--]]',[rfReplaceAll]);
Result := StringReplace(result, '&', '&',[rfReplaceAll]);
Result := StringReplace(result, '[[-xy-amp--]]', '&amp;',[rfReplaceAll]);
Result := StringReplace(result, '<', '<',[rfReplaceAll]);
Result := StringReplace(result, '>', '>',[rfReplaceAll]);
if isAttribute then
begin
Result := StringReplace(result, '''', ''',[rfReplaceAll]);
Result := StringReplace(result, '"', '"',[rfReplaceAll]);
end;
end;
Probably not the most efficient but has been working fine in FinalBuilder for years (XE7 - won’t work for non unicode versions) .
Quotes and apostrophes are only illegal in tags and attributes content, not text. See also useful tips in The XML FAQ: What are the special characters in XML?.
I personally use utf-8 notation for special characters (& < >)
Regards,