Having survived the Great Y2K threat, we are now observing an economy that
has become far more globalized than it was in the previous century. The extent
of the Internet's contribution is open to debate, but there is no question that
the associated information explosion supported the acceleration of market internationalization
that we now take for granted. For many of us, the realization of the extent
of countries' interdependence was driven home by the recent global economic
meltdown. So what does all this have to do with us Web developers? It's a resounding
wake up call that we have to think of other nationalities when we develop our
websites and applications. In most cases, developing a web app in English alienates
much of the world's population and greatly reduces potential profits! With that
in mind, this article is the kickoff for a series that discusses the ramifications
of globalization on our websites and applications. Today's article will deal
with locales and their implementation in the PHP language.
The Many Names of Internationalization
Besides the interchangeability of the terms internationalization and globalization,
there are a few abbreviations (or numeronyms) that are used to shorten these
longish words. A popular abbreviation is i18n
where the 18
stands for the number
of letters between the first i
and last n
in internationalization. Another is
L10n
. In this case, the 10
refers to the number of letters between the first
l
and the last n
in localization. The capital L
in L10n
also serves to further
distinguish it from the lowercase i
in i18n
. The term globalization, which is
favored by companies like Microsoft, IBM, and Sun Microsystems, can likewise
be abbreviated to g11n
.
The Locale Identifier
In computing, a locale is a set of parameters that defines attributes for a user's specific geographic location, including the user's language, country and certain cultural preferences that users want to see reflected in the interface. As such, we're speaking about a lot more than their language; the locale also affects the formatting of currency, numbers, dates, and time. Typically, different locales are identified by a language identifier and a region identifier.
[language[_territory][.codeset][@modifier]]
.
For example, Australian English using the UTF-8 encoding is en_AU.UTF-8
. Contrast
that with Windows, which uses a four-digit Locale identifier (LCID), such as
1033
for English (United States)
and 1041
for Japanese (Japan)
. These numbers
consist of a language code (lower 10 bits) and culture code (upper bits) and
are therefore often written in hexadecimal notation, such as 0x0409
or 0x0411
.
I personally became acquainted with LCIDs early on in my career as a developer
for the Canadian federal government. It is a nation-wide standard that all of
our applications must support both English and French users, as Canada is officially
a bilingual Nation.
You can use the executable locale to show your current locale on UNIX platforms.
For instance, the command locale -a
displays all the locales currently installed
on the machine. You can control the system locale via various environment variables.
These can be defined in your environment to be system-wide, or on a per-session
basis:
- LC_ALL: Overrides all LC_* environment variables with the given value
- LC_CTYPE: Character classification and case conversion
- LC_COLLATE: Collation (sort) order
- LC_TIME: Date and time formats
- LC_NUMERIC: Non-monetary numeric formats
- LC_MONETARY: Monetary formats
- LC_MESSAGES: Formats of informative and diagnostic messages, and of interactive responses
- LC_PAPER: Paper size
- LC_NAME: Name formats
- LC_ADDRESS: Address formats and location information
- LC_TELEPHONE: Telephone number formats
- LC_MEASUREMENT: Measurement units (Metric or Other)
- LC_IDENTIFICATION: Metadata about the locale information
- LANG: The default value, which is used when either LC_ALL is not set, or an applicable value for LC_* is not set
- NLSPATH: Delimited list of paths to search for message catalogs
- TZ: Time zone
Information on the current Locale in Windows is accessible from the Regional Options dialog:
To access the Regional Options dialog, click on START=>Settings=>Control
Panel
to bring up the Control Panel:
Then click on the Regional Options
item to bring up the dialog:
Your current locale is selected in the Your Locale (location)
dropdown list,
on the General
tab. Below the Settings for the current user
section is a list
of installed language packs. Other tabs define Number, Currency, Time,
and Date
formatting options. The Input Locales
tab allows you to change language-related
keyboard settings such as those that print non-English characters.
It's important to know how to configure your workstation's locale settings
because it can be helpful in testing locale-dependent application features.
It is also possible to affect an application without altering your global
machine settings through programming language features.
Locale Support in Programming Languages
Generally, it is preferable to set application locale attributes directly using language features rather than alter your workstation's global settings. Here are a couple of points to keep in mind regarding locale support in programming languages:
- general rule 1: the newer a development language is, the likelier that it offers multilingual and locale support; and that goes double for web languages.
- general rule 2: no two languages implement locale support in the same way!
For our first look at locale support in programming languages, we'll be exploring
PHP's I18N
libraries. PHP is a scripting language that was originally designed
for web development and the creation of dynamic content. In addition to being
well suited for multi-national applications, it is also one of the most popular
web development languages in use today.