V1076. Code contains invisible characters that may alter its logic. Consider enabling the display of invisible characters in the code editor.
The analyzer has detected characters in code that may confuse the developer. These characters may be invisible and change the code representation in IDEs. Such character sequences may lead to the fact that the developer and the compiler would interpret the code differently.
This can be done on purpose. This type of attack is called Trojan Source. To learn more:
- Trojan Source attack for introducing invisible vulnerabilities;
- Trojan Source: Invisible Vulnerabilities.
The analyzer issues a warning if it finds one of the following characters:
Character |
Code |
Definition |
Description |
---|---|---|---|
LRE |
U+202A |
LEFT-TO-RIGHT EMBEDDING |
The text after the LRE character is interpreted as inserted and displayed left-to-right. The action of LRE is interrupted by the PDF character or a newline character. |
RLE |
U+202B |
RIGHT-TO-LEFT EMBEDDING |
The text after the RLE character is interpreted as inserted and displayed right-to-left. The action of the RLE character is interrupted by the PDF character or a newline character. |
LRO |
U+202D |
LEFT-TO-RIGHT OVERRIDE |
The text after the LRO character is forcibly displayed left-to-right. The action of the LRO character is interrupted by the PDF character or a newline character. |
RLO |
U+202E |
RIGHT-TO-LEFT OVERRIDE |
The text after the RLO character is forcibly displayed right-to-left. The action of the RLO character is interrupted by the PDF character or a newline character. |
|
U+202C |
POP DIRECTIONAL FORMATTING |
The PDF character interrupts the action of one of the LRE, RLE, LRO or RLO characters encountered earlier. Interrupts exactly one last character encountered. |
LRI |
U+2066 |
LEFT‑TO‑RIGHT ISOLATE |
The text after the LRI symbol is displayed left-to-right and interpreted as isolated. This means that other control characters do not affect the display of this text fragment. The action of the LRI character is interrupted by the PDI character or a newline character. |
RLI |
U+2067 |
RIGHT‑TO‑LEFT ISOLATE |
The text after the RLI symbol is displayed right-to-left and interpreted as isolated. This means that other control characters do not affect the display of this text fragment. The RLI action is interrupted by the PDI symbol or the newline symbol. |
FSI |
U+2068 |
FIRST STRONG ISOLATE |
The direction of the text after the FSI character is set by the first control character not included in this text fragment. Other control characters do not affect the display of this text. The action of the FSI character is interrupted by the PDI character or a newline character. |
PDI |
U+2069 |
POP DIRECTIONAL ISOLATE |
The PDI symbol interrupts the action of one of the LRI, RLI or FSI symbols encountered earlier. Interrupts exactly one last character encountered. |
LRM |
U+200E |
LEFT-TO-RIGHT MARK |
The text after the LRM character is displayed left-to-right. The LRM action is interrupted by a newline character. |
RLM |
U+200F |
RIGHT-TO-LEFT MARK |
The text after the RLM character is displayed right-to-left. The RLM action is interrupted by a newline character. |
ALM |
U+061C |
ARABIC LETTER MARK |
The text after the ALM character is displayed right-to-left. The ALM action is interrupted by a newline character. |
ZWSP |
U+200B |
ZERO WIDTH SPACE |
An invisible space character. The use of ZWSP character causes different strings to be displayed the same way. For example, 'str[ZWSP]ing' is displayed as 'string'. |
Look at the following code fragment:
#include <iostream>
int main()
{
bool isAdmin = false;
/*[RLO] } [LRI] if (isAdmin)[PDI] [LRI] begin admins only */ // (1)
std::cout << "You are an admin.\n";
/* end admins only [RLO]{ [LRI]*/ // (2)
return 0;
}
Let's look closer at line (1).
[LRI] if (isAdmin)[PDI]
Here the [LRI] character has effect up to the [PDI] character. The 'if (isAdmin)' string is displayed left-to-right and is isolated. We get 'if (isAdmin)'.
[LRI] begin admins only */
Here the [LRI] character has effect up to the end of the string. We get an isolated string: 'begin admins only */'
[RLO] {space1}, '}', {space2}, 'if (isAdmin)', 'begin admins only */'
Here the [RLO] character has effect up to the end of the string and displays the text right-to-left. Each of the isolated strings obtained in the previous paragraphs is treated as a separate indivisible character. We get the following sequence:
'begin admins only */', 'if (isAdmin)', {space2}, '{', {space1}
Note that the closing brace character is now displayed as '{' instead of '}'.
The final view of line (1) that can be displayed in the editor:
/* begin admins only */ if (isAdmin) {
Similar transformations affect line (2), which is displayed like this:
/* end admins only */ }
The code fragment that can be displayed in the editor:
#include <iostream>
int main()
{
bool isAdmin = false;
/* begin admins only */ if (isAdmin) {
std::cout << "You are an admin.\n";
/* end admins only */ }
return 0;
}
The reviewer may think that the code is checked before displaying the message. They will ignore the comments and think that the code should be executed like this:
#include <iostream>
int main()
{
bool isAdmin = false;
if (isAdmin) {
std::cout << "You are an admin.\n";
}
return 0;
}
However, there is no check. For the compiler, the code above looks like this:
#include <iostream>
int main()
{
bool isAdmin = false;
std::cout << "You are an admin.\n";
return 0;
}
Now let's look at a simple and at the same time dangerous example where non-displayed characters are used:
#include <string>
#include <string_view>
enum class BlockCipherType { DES, TripleDES, AES, /*....*/ };
constexpr BlockCipherType
StringToBlockCipherType(std::string_view str) noexcept
{
if (str == "AES[ZWSP]")
return BlockCipherType::AES;
else if (str == "TripleDES[ZWSP]")
return BlockCipherType::TripleDES;
else
return BlockCipherType::DES;
}
The 'StringToBlockCipherType' function converts a string to one of the values of the 'BlockCipherType' enumeration. You may think that the function returns three different values, but it doesn't. Since a invisible space character [ZWSP] is added at the end of each string literal, the check for equality with strings 'AES' and 'TriplesDES' will be false. As a result, out of three expected returned values, the function returns only 'BlockCipherType::DES'. At the same time, the code editor may display the code like this:
#include <string>
#include <string_view>
enum class BlockCipherType { DES, TripleDES, AES, /*....*/ };
constexpr BlockCipherType
StringToBlockCipherType(std::string_view str) noexcept
{
if (str == "AES")
return BlockCipherType::AES;
else if (str == "TripleDES")
return BlockCipherType::TripleDES;
else
return BlockCipherType::DES;
}
If the analyzer issued the warning about invisible characters in code, turn on the display of invisible characters. Make sure they don't change the logic of the program execution.
This diagnostic is classified as: