Globalize your Web Applications: PHP's Locale Package | WebReference

Globalize your Web Applications: PHP's Locale Package

By Rob Gravelle


[next]

Having survived the Great Y2K threat, we are now observing an economy that has become far more globalized than it was in the previous century. The extent of the Internet's contribution is open to debate, but there is no question that the associated information explosion supported the acceleration of market internationalization that we now take for granted. For many of us, the realization of the extent of countries' interdependence was driven home by the recent global economic meltdown. So what does all this have to do with us Web developers? It's a resounding wake up call that we have to think of other nationalities when we develop our websites and applications. In most cases, developing a web app in English alienates much of the world's population and greatly reduces potential profits! With that in mind, this article is the kickoff for a series that discusses the ramifications of globalization on our websites and applications. Today's article will deal with locales and their implementation in the PHP language.

The Many Names of Internationalization

Besides the interchangeability of the terms internationalization and globalization, there are a few abbreviations (or numeronyms) that are used to shorten these longish words. A popular abbreviation is i18n where the 18 stands for the number of letters between the first i and last n in internationalization. Another is L10n. In this case, the 10 refers to the number of letters between the first l and the last n in localization. The capital L in L10n also serves to further distinguish it from the lowercase i in i18n. The term globalization, which is favored by companies like Microsoft, IBM, and Sun Microsystems, can likewise be abbreviated to g11n.

The Locale Identifier

In computing, a locale is a set of parameters that defines attributes for a user's specific geographic location, including the user's language, country and certain cultural preferences that users want to see reflected in the interface. As such, we're speaking about a lot more than their language; the locale also affects the formatting of currency, numbers, dates, and time. Typically, different locales are identified by a language identifier and a region identifier.

The two major operating systems both support different locales, albeit using different mechanisms. On UNIX, Linux and other POSIX-type platforms, locale identifiers are defined in the following format: [language[_territory][.codeset][@modifier]]. For example, Australian English using the UTF-8 encoding is en_AU.UTF-8. Contrast that with Windows, which uses a four-digit Locale identifier (LCID), such as 1033 for English (United States) and 1041 for Japanese (Japan). These numbers consist of a language code (lower 10 bits) and culture code (upper bits) and are therefore often written in hexadecimal notation, such as 0x0409 or 0x0411. I personally became acquainted with LCIDs early on in my career as a developer for the Canadian federal government. It is a nation-wide standard that all of our applications must support both English and French users, as Canada is officially a bilingual Nation.

You can use the executable locale to show your current locale on UNIX platforms. For instance, the command locale -a displays all the locales currently installed on the machine. You can control the system locale via various environment variables. These can be defined in your environment to be system-wide, or on a per-session basis:

  • LC_ALL: Overrides all LC_* environment variables with the given value
  • LC_CTYPE: Character classification and case conversion
  • LC_COLLATE: Collation (sort) order
  • LC_TIME: Date and time formats
  • LC_NUMERIC: Non-monetary numeric formats
  • LC_MONETARY: Monetary formats
  • LC_MESSAGES: Formats of informative and diagnostic messages, and of interactive responses
  • LC_PAPER: Paper size
  • LC_NAME: Name formats
  • LC_ADDRESS: Address formats and location information
  • LC_TELEPHONE: Telephone number formats
  • LC_MEASUREMENT: Measurement units (Metric or Other)
  • LC_IDENTIFICATION: Metadata about the locale information
  • LANG: The default value, which is used when either LC_ALL is not set, or an applicable value for LC_* is not set
  • NLSPATH: Delimited list of paths to search for message catalogs
  • TZ: Time zone

Information on the current Locale in Windows is accessible from the Regional Options dialog:

Figure 1

To access the Regional Options dialog, click on START=>Settings=>Control Panel to bring up the Control Panel:

Figure 2

Then click on the Regional Options item to bring up the dialog:

Figure 3

Your current locale is selected in the Your Locale (location) dropdown list, on the General tab. Below the Settings for the current user section is a list of installed language packs. Other tabs define Number, Currency, Time, and Date formatting options. The Input Locales tab allows you to change language-related keyboard settings such as those that print non-English characters.

It's important to know how to configure your workstation's locale settings because it can be helpful in testing locale-dependent application features. It is also possible to affect an application without altering your global machine settings through programming language features.

Locale Support in Programming Languages

Generally, it is preferable to set application locale attributes directly using language features rather than alter your workstation's global settings. Here are a couple of points to keep in mind regarding locale support in programming languages:

  • general rule 1: the newer a development language is, the likelier that it offers multilingual and locale support; and that goes double for web languages.
  • general rule 2: no two languages implement locale support in the same way!

For our first look at locale support in programming languages, we'll be exploring PHP's I18N libraries. PHP is a scripting language that was originally designed for web development and the creation of dynamic content. In addition to being well suited for multi-national applications, it is also one of the most popular web development languages in use today.


[next]