There are only two RFC standards for encoding a Request URI: hex encoding and UTF-8 Unicode encoding. Double percent hex encoding, double nibble hex encoding, first/second nibble hex encoding, 2/3-byte UTF encoding, %U UTF encoding should be blocked. Mismatch encoding should also be taken care of.
URI Hex Encoding
The encoding method consists of escaping a hexadecimal byte value for the encoded character with a ‘%’. If we wanted to hex encode a capital A , the encoding would look like %41 i.e ‘A’. In double percent encoding percent is encoded using hex encoding followed by the hexadecimal byte value to be encoded. So %2541 = ‘A’
In first nibble only first nibble is encoded. i.e 4 of \x41 will be encoded.
So %%341 = ‘A’. During the first URL decoding pass the %34 is decoded as the numeral 4, which leaves %41 for the second pass. During the second pass, the %41 is decoded as a capital A. Similar for second nibble.
Read more…