Make Sure We're Using the Same Language

Tuesday, June 27, 2017

Jay Kelley


A new spin on an old cyberattack threat was uncovered earlier this year by a Chinese security researcher, and has been reported on extensively by security press and publications.

While this repurposed threat has not yet been seen or experienced publicly, it is a particularly devious one that can potentially lead to a spate of phishing attacks meant to spread malware and steal critical credentials.

The re-spun threat leverages non-ASCII characters found in non-English alphabets, many of which either strongly resemble or are identical to characters in the English or Latin alphabet. It was for this reason that the International Corporation for Assigned Names and Numbers (ICANN), the non-profit organization responsible for the maintenance and security of the databases constituting the Internet’s naming conventions, decided that using the computer industry standard for representing text in the most used writing systems, known as Unicode, would be too confusing, because many Unicode characters look alike. That could be confusing and could lead to insecurities in Internet naming; it could also easily spawn phishing attacks.

This sort of attack is known as an internationalized domain name (IDN) homograph attack. It’s akin to another form of attack, typosquatting, in which a hacker leverages a similar, but usually misspelled word or brand name for nefarious purposes, like creating websites for phishing and credential theft.

Instead, ICANN decided to use Punycode for Internet naming. Punycode is a way to represent various non-ASCII characters – such as characters in non-English or non-Latin writing systems – in ASCII characters using sequences of English alphabet letters, numbers and hyphens.

Web browsers were intended to read Punycode characters for a URL and then, in the browser, translate them into Unicode characters. But, many web browser developers realized that Punycode could be used for malicious purposes, such as cloaking URLs for websites created for phishing as valid URLs and websites.

Some web browsers, attempting to block spoofed URLs using different writing systems and their differing alphabets, included filters which would discern if a URL mixed various alphabets. If a URL contained both English/Latin and Cyrillic characters, for example, instead of rendering the URL in Unicode, the browser would render the URL characters in Punycode. These browsers would only render a URL in Unicode if all the characters contained in the URL were from the same language and alphabet. For instance, the word “Bank” is spelled as such using English/Latin alphabet letters in Punycode; but, if someone tried to spell “Bank” using the Cyrillic letter “ve” (в) at the start of the word, while the mixed alphabet word would look like “вank”, the URL would be displayed as “xn--ank-edd”, the Punycode equivalent, as it would be mixing English/Latin and Cyrillic letters.

This all made sense, until earlier this year, when Xudong Zheng, a Chinese security researcher, uncovered a new attack variation.

The variation is, if a domain were to be registered in a language with alphabet characters that closely resembled the English/Latin alphabet, the URL would remain in Unicode and not be translated into Punycode, thereby spoofing the real web site’s URL, enabling a malicious person to setup a phishing website using what appears to be a legitimate URL.

While this attack has not yet been found yet “in the wild”, it is an extremely dangerous variant because it is almost completely undetectable. Potentially, only a sharp-eyed, trained observer might notice the slightest differences between the URLs; and then, only a truly security-conscious person might look up the web page’s certificate, which would show the URL in Punycode.  Other than that, the phishing page and attack could be imperceptible.

The example Mr. Xudong used in his blog post was “”. Simply using Cyrillic letters, in lowercase, “a” (а), “er” (р), and “ye” (е), and “palochka” (ӏ), which, in Punycode, reads “xn--80ak6aa92e”, he created a phishing web page which appears to be from the “” domain.

Another example of a potential URL that could be created to fool users is “”. Using the Cyrillic lowercase letters of “er” (р), “a” (а), and “u” (у), and an uppercase “palochka” (ӏ) – in Punycode, “xn--80aa0cbo66e” – in certain web browsers, the URL would appear as “раураӀ.com”. As you can see, it’s nearly impossible to tell the difference between the real URL using English/Latin letters versus the URL using Cyrillic letters.

In the initial research findings, several very popular web browsers fell victim to this homographic attack. Google Chrome, Mozilla Firefox, and Opera could not differentiate between the English/Latin letters and Cyrillic letters in a URL. And, since only one writing system or alphabet was used, and it wasn’t a mixed alphabet being used, there were no red flags raised.

Since this attack variant was first reported, Google has upgraded Chrome to address this issue.  A permanent fix to address the Punycode issue landed in Chrome Stable 58. Opera addressed the situation in a late April 2017 release (Opera Stable 44.0.2510.1449).

Mozilla has, to date, not made public whether it will address this issue in a dedicated patch or future release. However, Mozilla did augment their “whitelist with something based on ascertaining whether all the characters in a label all come from the same script, or are from one of a limited and defined number of allowable combinations." "Mozilla’s betting that any mixed language or “script” homographs “will be recognizable to people who understand that script.” However, there is a manual way in which Firefox users may turn on Punycode to display the URL instead of Unicode: In the address bar, type about:config, and change the network.IDN_show_punycode attribute to “true”.)

Microsoft Edge and Internet Explorer, and Apple Safari have thus far been immune to this attack. Other than changing web browsers, among other options that can alleviate this attack threat is isolation technology, making use of disposable virtual containers and advanced rendering technology. 

While it hasn’t been seen or released into the wild – yet – this IDN homographic attack is simultaneously, deviously innovative and treacherous, and something that needs to be planned for before it’s launched as an invisible, undetectable phishing assault.  

Possibly Related Articles:
Enterprise Security Security Awareness Phishing
browsers Punycode Unicode Characters non-ASCII characters
Post Rating I Like this!
The views expressed in this post are the opinions of the Infosec Island member that posted this content. Infosec Island is not responsible for the content or messaging of this post.

Unauthorized reproduction of this article (in part or in whole) is prohibited without the express written permission of Infosec Island and the Infosec Island member that posted this content--this includes using our RSS feed for any purpose other than personal use.