January 26, 2000 - Unicode
January 26, 2000 Unicode Tips: January 2000
Yehuda Shiran, Ph.D.
|
\u
(backslash followed by a lowercase u
). The Unicode value \u0041
, for example, represents the character A
. The Unicode unique name for this character is LATIN CAPITAL LETTER A.
Unicode is compatible with ASCII characters. The first 128 Unicode characters correspond to the ASCII characters and have the same numeric value. ASCII's 0x41
is the same as Unicode's \u0041
. While ASCII's 128 characters supports just the Latin alphabet, Unicode's over 65,000 characters can support many different languages. Unicode is fully compatible with ISO's 10646-1 and UCS-2 standards. JavaScript programs will still be written in the ASCII-set characters. You can use non-ASCII Unicode characters in the comments and string literals of JavaScript.
The calculator below accepts a Unified code value (just the four hexadecimal characters, no \u
) and prints the corresponding character in the middle of the following sentence: "Unicode Demo:
"
Here are some common special characters and their Unicode value:
Unicode Value Name Symbol \u0009 Tab <TAB> \u000B Vertical Tab <TAB> \u000C Form Feed <FF> \u0020 Space <SP> \u000A Line Feed <LF> \u000D Carriage Return <CR> \u0022 Double Quote <TAB> \u0027 Single Quote <'> \u005C Backslash <\>
You can play with our Unicode calculator above and find many Unicode values that yield unexpected characters. Although Unicode can support more than 65,000 different characters, it is up to your browser to provide the Unicode fonts. Often, Unicode fonts do not display all the Unicode characters. In addition to the client's (browser's) support, the client platform must support Unicode as well. Some platforms, such as Windows 95, provide only partial support for Unicode.
The other problem with Unicode is how to enter Non-ASCII characters. Often, the only way to specify Unicode characters is by using Unicode escape sequences as shown in the table above. Unicode specification, though, requires that composite characters must be specified by a sequence of Unicode characters led by the base one. Many French characters, for example, are built on top of the Latin character set with additional hyphens, carets, apostrophes, etc. The Unicode specification requires that such characters must be specified by the Latin character, followed by the apostrophes' (for example) Unicode value. The JavaScript implementation, like other ones, does not support this option. No combining sequences are interpreted by JavaScript. A Unicode escape sequence for each French character is used instead.
Unicode support was introduced in JavaScript 1.3. Learn more about the features of JavaScript 1.3 in Column 25, JavaScript 1.3 Overview, Part I, and Column 26, JavaScript 1.3 Overview, Part II.