Does anyone know if delphi has a fix for this? (and which versions?)
I think this is something that has been discussed in the Discord community regarding usernames - people pretending to have a same username as another person by using non-spaces and similar-looking characters.
Regrding code, this is a rust linter output from a twitter post:
and, btw, the security paper: https://www.trojansource.codes/trojan-source.pdf
Whoa, I’m both impressed and appalled at the same time.
I was aware of similar looking unicode characters being used to fake domain names, but this is another level. What is the legitimate purpose of those directional unicode control characters? ie. What are they used for apart from dodginess?
Seems this is more far reaching than just compilers
I wonder if we’re going to have to start sanitising input for these characters?
Ahh, it’s actually in the name. Used for bidi text, eg. right-to-left, like arabic, etc.
BTW, Krebs’ post on it is really good ‘Trojan Source’ Bug Threatens the Security of All Code – Krebs on Security
@Malcolm do you know if Embarcadero is one of the compiler vendors contacted before the 99 day embargo expired? Any word from them on how far away their response to it is?
No idea. I’ve sent off a question to them, will update when I know more.
Would I be being hopelessly optimistic to think a response to this issue was secretly put in last week’s hotfix?
Speaking more seriously though, I wonder far back this vulnerability will affect Delphi versions. Presumably Delphi 2009 and later are all potentially vulnerable. I’m fairly certain Delphi 7 and earlier won’t compile source code in Unicode files, not sure about Delphi 2005 to Delphi 2007.
pascal is more resistant to this than most languages, but it would be good to (at a minimum) make a compile error for the directional unicode control characters to be found in any source code.
Also, it would be great to have a RTL routine to prevent them appearing in code anywhere… Does anyone have one?
‘As described’, well at least to me, it doesn’t seem to affect most ‘Delphi’ software shops at all, as the kind of person that would exploit such a vulnerability seems to be an unlikely employee for a small sized software house.
OTOH, The vulnerability of creating (very) unobvious phishing URLs using this ‘vulnerability’ seems very real
I think the most likely place this would be a risk for Delphi programmers would be within the source code of pirated third party components. Yet another reason not to visit dangerous places.
I also wonder if it’s possible for the exploit to survive a copy and paste from a browser webpage into the IDE?
Fortunately most browsers will be on top of this issue pretty quickly, most likely already done.
Delphi source code by default does not support unicode.
You have to specifically enable it when saving files containing unicode characters. These source code files then have a EF.BB.BF signature at the start so can be easily identified.
They generally occur when you have unicode constants in the file.
But aren’t there legitimate uses for them in some source code? I’m a little out of my depth here with bidi languages, but if I understood their purpose I can imagine arabic speaking developers using them in comments, string literals and possibly other places (identifiers?). I’ve seen enough chinese/Japanese/korean Delphi source to no longer be surprised at unicode-sprinkled pascal. I even used to work with someone who used emoji’s in his source
Seems like a fix would need to be a combo of a) showing them clearly in the source and b) error/warning when they appear outside of legit places.
Gotta wonder if it’s all overkill, at least in compilers.
In the old mainframe PL/1 compiler you could put print control characters in column 1 and it was kind of an artform to write programs that printed off one way, whilst compiling and executing another.
The senior programmer(s) always liked to work with listings and so you’d take your problem to them and have fun seeing how long it took before they discovered the subterfuge.
I updated my post with the code from the C# ‘trojan’ example in Notepad++, D11, and D10.2
The control chars are evident.
You cannot even search for these easily in files: $20 is a space, $2e – “.”, $66 – “f” and $69 – “I”, so there will be plenty of hits in all kinds of files.
Delphi saves Unicode files in UTF8, though, isn’t it? So these codes would be different, i.e.: E2 80 AE.
All versions still save in ANSI format by default.
You can choose to save as UTF8 with a BOM (also UTF16 and UTF32) in the last few versions. The IDE does not cope with UTF8 without a BOM (if the file actually contains Unicode characters).