UTF-16 (16- bit Unicode Transformation Format)
UTF-16 (16- bit Unicode Transformation Format) is a standard method of encoding Unicode character data. Part of the Unicode Standard version 3.0 (and higher-numbered versions), UTF-16 has the capacity to encode all currently defined Unicode characters. UTF-16 is specified in Annex Q of the ISO/IEC 10646 standard and in the IETF RFC 2781.
Unicode is designed to accommodate all of the world's known writing systems. The system currently employs three different encodings to represent Unicode character sets: UTF-8, UTF-16 and UTF-32. Each encoding defines a system whereby characters in some character set may be represented in binary form in a file . Each such binary representation of a character is called a code point. Unicode can define over one million distinct encodings (10FFFF code points in hexadecimal ; 1,114,112 in decimal). Unicode code points are divided into 17 planes, of which Planes 0 through 2 are most common:
- Plane 0, known as the Basic Multilingual Pane (BMP) contains characters for almost all modern languages as well as most common special characters.
- Plane 1, known as the Supplementary Multilingual Plane (SMP) is used primarily for historic scripts such as Linear B and for musical and mathematical symbols .
- Plane 2, known as the Supplementary Ideographic Plane (SIP), is used for about 40,000 Unified Han Ideographs seldom used in daily written communications.
UTF-16 encodes characters into specific binary sequences using either one or two 16-bit sequences. Because there are three different encoding schemes to map code points to 8-bit or octet sequences, there are three different encoding schemes around the basic 16-bit sequence model.
UTF-16 is sometimes used interchangeably with UCS-2 although such use is not strictly correct.