What gets detected
Confusable characters (homoglyphs) Characters from one script that look identical to characters from another. For example, Cyrillic ‘а’ (U+0430) looks identical to Latin ‘a’ (U+0061) but has a different code point. An attacker could write “pаypal.com” using a Cyrillic ‘а’ to make a malicious instruction look legitimate. Detection uses the Unicode confusables database. A character is flagged only when it appears in a word that mixes scripts. A fully Cyrillic word in a Russian sentence is not flagged. Bidirectional override characters Invisible Unicode control characters (U+202A through U+2069) that change text rendering direction. These enable Trojan Source attacks where displayed text differs from the actual content.What happens when deceptive characters are found
- The prompt text is automatically corrected:
- Confusable characters are replaced with their Latin equivalents (e.g., Cyrillic ‘а’ → Latin ‘a’)
- Bidirectional overrides are stripped
- A warning banner appears showing how many characters were removed
- You can click Undo to restore the original text if the correction was wrong
Allowed script combinations
Some languages naturally mix scripts. These combinations are not flagged:| Combination | Reason |
|---|---|
| Han + Hiragana + Katakana | Japanese text |
| Han + Hangul | Korean text with Hanja |
| Han + Bopomofo | Chinese phonetic annotation |
| Han + Latin | Technical/international contexts (Han has no Latin lookalikes) |
Limitations
- Detection is per-word. A confusable character in an isolated word (not mixed with another script) is not flagged.
- Only characters with entries in the Unicode confusables database are detected.
- The feature validates prompt text only, not file contents or tool outputs.
Related
- Command deny list — block specific commands from agent execution
- Executable deny list — block binaries at the kernel level
- Guardrails overview