Become a Bug bounty tutorial: Unit 2: Unicode Encoding

Unicode Encoding

For the penetration testing on web applications, Unicode encoding is very important because it can sometimes be used to defeat input validation
mechanisms.

How it works:
If an input filter blocks certain malicious expressions, but the
component that subsequently processes the input understands Unicode
encoding, then it may be possible to bypass the filter using various standard
and malformed Unicode encodings.

Unicode is a character encoding standard that is designed to support all of the
writing systems used in the world. It employs various encoding schemes, some
of which can be used to represent unusual characters in web applications.
16-bit Unicode encoding works in a similar way to URL-encoding. For
transmission over HTTP, the 16-bit Unicode-encoded form of a character is the
%u prefix followed by the character’s Unicode code point expressed in hexa-
decimal. For example:

%u00e9 é
%u2215 /

UTF-8 is a variable-length encoding standard that employs one or more
bytes to express each character. For transmission over HTTP, the UTF-8
encoded form of a multi-byte character simply uses each byte expressed in
hex code and preceded by the % prefix. For example:

%e2%89%a0 ≠
%c2%a9 ©

Author Description

Techeia.com

Android

Linux