Man
Professional
- Messages
- 3,222
- Reaction score
- 828
- Points
- 113
Unicode is extremely complex. Few people know all the tricks: from invisible symbols and control characters to surrogate pairs and combined emoji (when adding two characters together produces a third). The standard includes 2 16 code positions in 17 planes. In fact, learning Unicode can be compared to learning a separate programming language.
It is not surprising that web developers miss some nuances. On the other hand, attackers can use Unicode features for their own purposes, which is what they do.
Security specialist John Gracey demonstrated a bug in checking an email address to recover a forgotten password using GitHub as an example . Similar bugs can be found on other sites.
John Gracey explains what a "character translation collision" is, where two different characters are converted to the same character.
In this case, he used the Turkish character 'ı' ('i' without a dot), which is translated to the Latin 'i', so the postal address John@Gıthub.comis processed as John@Github.com:
Such collisions can be found on all planes of Unicode: here is a complete list.
We are primarily interested in those signs that are translated into Latin characters. There are only eleven of these options. In third place in the table is the Turkish sign 'i' without a dot.
GitHub allowed an attacker to obtain someone else's account password because the forgotten password recovery procedure did not work correctly.
This procedure compared the entered email address with the address stored in the database. The verification algorithm:
Apparently, the developers were unaware of the URL translation collision when using the method toLowerCase.
In this case, the error is easy to fix. It is enough to send the password not to the entered address, but to the address from the database.
Of course, this is not a complete fix for the error, but only a quick patch. A more complete solution would be to translate it into Punycode for verification: John@Gıthub.com→ xn—john@gthub-2ub.com. Punycode was designed to unambiguously convert domain names into a sequence of ASCII characters. An email address can be verified in the same way, but most web applications do not do this.
John Gracey received a cash reward and 2,500 points in the rating for finding the vulnerability, although he is still far from the main GitHub hacker Alexander Dobkin <img src=404 onerror=alert(document.domain)>: a user with such an unusual name has already earned 30,750 points, including for executing arbitrary code on GitHub servers, where GitHub Pages are generated.
Messenger crashes when receiving emoji with a black dot (Messenger on iOS, WhatsApp on Android)
Unicode-related bugs have the property that they can be found in any application that processes text entered by the user. Vulnerabilities exist in both web applications and native programs for Android and iOS. One of the most famous was the iOS bug from 2015, when several Unicode characters in a text message caused the operating system to crash. Last year, a similar Unicode bug was discovered in iOS 11.3, it is known as the "black dot" . A similar crash occurred in the WhatsApp application for Android if you touched an emoji.
It is not surprising that web developers miss some nuances. On the other hand, attackers can use Unicode features for their own purposes, which is what they do.
Security specialist John Gracey demonstrated a bug in checking an email address to recover a forgotten password using GitHub as an example . Similar bugs can be found on other sites.
John Gracey explains what a "character translation collision" is, where two different characters are converted to the same character.
In this case, he used the Turkish character 'ı' ('i' without a dot), which is translated to the Latin 'i', so the postal address John@Gıthub.comis processed as John@Github.com:
Code:
'ß'.toLowerCase() // 'ss'
'ß'.toLowerCase() === 'SS'.toLowerCase() // true
// Note the Turkish dotless i
'John@Gıthub.com'.toUpperCase() === 'John@Github.com'.toUpperCase()
Such collisions can be found on all planes of Unicode: here is a complete list.
We are primarily interested in those signs that are translated into Latin characters. There are only eleven of these options. In third place in the table is the Turkish sign 'i' without a dot.
Sign | Code point | Result |
---|---|---|
ß | 0x00DF | SS |
ı | 0x0131 | I |
ſ | 0x017F | S |
ff | 0xFB00 | FF |
fi | 0xFB01 | FI |
fl | 0xFB02 | FL |
ffi | 0xFB03 | FFI |
ffl | 0xFB04 | FFL |
ſt | 0xFB05 | ST |
st | 0xFB06 | ST |
K | 0x212A | k |
GitHub allowed an attacker to obtain someone else's account password because the forgotten password recovery procedure did not work correctly.
This procedure compared the entered email address with the address stored in the database. The verification algorithm:
- The entered address is converted to lowercase using the toLowerCase method.
- The entered address is compared with the address in the database of registered users.
- If a match is found, the password from the database is sent to the entered address.
Apparently, the developers were unaware of the URL translation collision when using the method toLowerCase.
In this case, the error is easy to fix. It is enough to send the password not to the entered address, but to the address from the database.
Of course, this is not a complete fix for the error, but only a quick patch. A more complete solution would be to translate it into Punycode for verification: John@Gıthub.com→ xn—john@gthub-2ub.com. Punycode was designed to unambiguously convert domain names into a sequence of ASCII characters. An email address can be verified in the same way, but most web applications do not do this.
John Gracey received a cash reward and 2,500 points in the rating for finding the vulnerability, although he is still far from the main GitHub hacker Alexander Dobkin <img src=404 onerror=alert(document.domain)>: a user with such an unusual name has already earned 30,750 points, including for executing arbitrary code on GitHub servers, where GitHub Pages are generated.

Messenger crashes when receiving emoji with a black dot (Messenger on iOS, WhatsApp on Android)
Unicode-related bugs have the property that they can be found in any application that processes text entered by the user. Vulnerabilities exist in both web applications and native programs for Android and iOS. One of the most famous was the iOS bug from 2015, when several Unicode characters in a text message caused the operating system to crash. Last year, a similar Unicode bug was discovered in iOS 11.3, it is known as the "black dot" . A similar crash occurred in the WhatsApp application for Android if you touched an emoji.