January 26, 2000 - Unicode

January 26, 2000
Unicode

Tips: January 2000

Yehuda Shiran, Ph.D.
Doc JavaScript

The Unicode standard is a fixed-width uniform encoding scheme. Its target usage is for interchange and display of many different languages, as well as historic scripts, technical and mathematical symbols, and multilingual texts. The Unicode standard specifies the identity of the character and its numeric value. The 16-bit numeric value is defined by a hexadecimal number and a prefix \u (backslash followed by a lowercase u). The Unicode value \u0041, for example, represents the character A. The Unicode unique name for this character is LATIN CAPITAL LETTER A.

Unicode is compatible with ASCII characters. The first 128 Unicode characters correspond to the ASCII characters and have the same numeric value. ASCII's 0x41 is the same as Unicode's \u0041. While ASCII's 128 characters supports just the Latin alphabet, Unicode's over 65,000 characters can support many different languages. Unicode is fully compatible with ISO's 10646-1 and UCS-2 standards. JavaScript programs will still be written in the ASCII-set characters. You can use non-ASCII Unicode characters in the comments and string literals of JavaScript.

The calculator below accepts a Unified code value (just the four hexadecimal characters, no \u) and prints the corresponding character in the middle of the following sentence: "Unicode Demo:Netscape Corporation"

Here are some common special characters and their Unicode value:

Unicode Value	Name	Symbol
\u0009	Tab	<TAB>
\u000B	Vertical Tab	<TAB>
\u000C	Form Feed	<FF>
\u0020	Space	<SP>
\u000A	Line Feed	<LF>
\u000D	Carriage Return	<CR>
\u0022	Double Quote	<TAB>
\u0027	Single Quote	<'>
\u005C	Backslash	<\>

You can play with our Unicode calculator above and find many Unicode values that yield unexpected characters. Although Unicode can support more than 65,000 different characters, it is up to your browser to provide the Unicode fonts. Often, Unicode fonts do not display all the Unicode characters. In addition to the client's (browser's) support, the client platform must support Unicode as well. Some platforms, such as Windows 95, provide only partial support for Unicode.

The other problem with Unicode is how to enter Non-ASCII characters. Often, the only way to specify Unicode characters is by using Unicode escape sequences as shown in the table above. Unicode specification, though, requires that composite characters must be specified by a sequence of Unicode characters led by the base one. Many French characters, for example, are built on top of the Latin character set with additional hyphens, carets, apostrophes, etc. The Unicode specification requires that such characters must be specified by the Latin character, followed by the apostrophes' (for example) Unicode value. The JavaScript implementation, like other ones, does not support this option. No combining sequences are interpreted by JavaScript. A Unicode escape sequence for each French character is used instead.

Unicode support was introduced in JavaScript 1.3. Learn more about the features of JavaScript 1.3 in Column 25, JavaScript 1.3 Overview, Part I, and Column 26, JavaScript 1.3 Overview, Part II.

January 26, 2000 - Unicode

Find a programming school near you