Python Essentials: Operators and Expressions | 4

'; figDoc.write(zhtm); figDoc.close(); } // modified 3.1.99 RWE v4.1 -->

Python Essentials: Operators and Expressions

Unicode Strings

The use of standard strings and Unicode strings in the same program presents a number of subtle complications. This is because such strings may be used in a variety of operations, including string concatenation, comparisons, dictionary key lookups, and as arguments to built-in functions.

To convert a standard string, s, to a Unicode string, the built-in unicode(s [, encoding [,errors]]) function is used. To convert a Unicode string, u, to a standard string, the string method u.encode([encoding [, errors]]) is used. Both of these conversion operators require the use of a special encoding rule that specifies how 16-bit Unicode character values are mapped to a sequence of 8-bit characters in standard strings, and vice versa. The encoding parameter is specified as a string and is one of the following values:

Value	Description
`'ascii'`	7-bit ASCII
`'latin-1'` `or` `'iso-8859-1'`	ISO 8859-1 Latin-1
`'utf-8'`	8-bit variable-length encoding
`'utf-16'`	16-bit variable-length encoding (may be little or big endian)
`'utf-16-le'`	UTF-16, little endian encoding
`'utf-16-be'`	UTF-16, big endian encoding
`'unicode-escape'`	Same format as Unicode literals `u"string"`
`'raw-unicode-escape'`	Same format as raw Unicode literals `ur"string"`

The default encoding is set in the site module and can be queried using sys. getdefaultencoding(). In most cases, the default encoding is 'ascii', which means that ASCII characters with values in the range [0x00,0x7f] are directly mapped to Unicode characters in the range [U+0000, U+007F]. Details about the other encodings can be found in Chapter 9, "Input and Output."

When string values are being converted, a UnicodeError exception may be raised if a character that can't be converted is encountered. For instance, if the encoding rule is 'ascii', a Unicode character such as U+1F28 can't be converted because its value is too large. Similarly, the string "\xfc" can't be converted to Unicode because it contains a character outside the range of valid ASCII character values. The errors parameter determines how encoding errors are handled. It's a string with one of the following values:

Value	Description
`'strict'`	Raises a `UnicodeError` exception for decoding errors.
`'ignore'`	Ignores invalid characters.
`'replace'`	Replaces invalid characters with a replacement character (`U+FFFD` in Unicode, `'?'` in standard strings).
`'backslashreplace'`	Replaces invalid characters with a Python character escape sequence. For example, the character U+1234 is replaced by `'\u1234'`.
`'xmlcharrefreplace'`	Replaces invalid characters with an XML character reference. For example, the character U+1234 is replaced by `'ሴ'`.

The default error handling is 'strict'.

When standard strings and Unicode strings are mixed in an expression, standard strings are automatically coerced to Unicode using the built-in unicode() function. For example:

s = "hello"
t = u"world"
w = s + t     # w = unicode(s) + t

When Unicode strings are used in string methods that return new strings (as described in Chapter 3), the result is always coerced to Unicode. Here's an example:

a = "Hello World"
b = a.replace("World", u"Bob") # Produces u"Hello Bob"

Furthermore, even if zero replacements are made and the result is identical to the original string, the final result is still a Unicode string.

If a Unicode string is used as the format string with the % operator, all the arguments are first coerced to Unicode and then put together according to the given format rules. If a Unicode object is passed as one of the arguments to the % operator, the entire result is coerced to Unicode at the point at which the Unicode object is expanded. For example:

c = "%s %s" % ("Hello", u"World") # c = "Hello " + u"World"
d = u"%s %s" % ("Hello", "World") # d = u"Hello " + u"World"

When applied to Unicode strings, the str() and repr() functions automatically coerce the value back to a standard string. For Unicode string u, str(u) produces the value u.encode() and repr(u) produces u"%s" % repr(u.encode('unicode-escape')).

In addition, most library and built-in functions that only operate with standard strings will automatically coerce Unicode strings to a standard string using the default encoding. If such a coercion is not possible, a UnicodeError exception is raised.

Standard and Unicode strings can be compared. In this case, standard strings are coerced to Unicode using the default encoding before any comparison is made. This coercion also occurs whenever comparisons are made during list and dictionary operations. For example, 'x' in [u'x', u'y', u'z'] coerces 'x' to Unicode and returns True. For character containment tests such as 'W' in u'Hello World', the character 'W' is coerced to Unicode before the test.

When computing hash values with the hash() function, standard strings and Unicode strings produce identical values, provided that the Unicode string only contains characters in the range [U+0000, U+007F]. This allows standard strings and Unicode strings to be used interchangeably as dictionary keys, provided that the Unicode strings are confined to ASCII characters. For example:

a = { }
a[u"foo"] = 1234
print a["foo"]    # Prints 1234

However, it should be noted that this dictionary key behavior may not hold if the default encoding is ever changed to something other than 'ascii' or if Unicode strings contain non-ASCII characters. For example, if 'utf-8' is used as a default character encoding, it's possible to produce pathological examples in which strings compare as equal, but have different hash values. For example:

a = u"M\u00fcller"      # Unicode string
b = "M\303\274ller"     # utf-8 encoded version of a
print a == b            # Prints '1', true
print hash(a)==hash(b)  # Prints '0', false

Boolean Expressions and Truth Values

The and, or, and not keywords can form Boolean expressions. The behavior of these operators is as follows:

Operator	Description
x `or` y	If x is false, return y; otherwise, return x.
x `and` y	If x is false, return x; otherwise, return y.
`not` x	If x is false, return 1; otherwise, return 0.

When you use an expression to determine a true or false value, True, any nonzero number, nonempty string, list, tuple, or dictionary is taken to be true. False, zero, None, and empty lists, tuples, and dictionaries evaluate as false. Boolean expressions are evaluated from left to right and consume the right operand only if it's needed to determine the final value. For example, a and b evaluates b only if a is true.

Object Equality and Identity

The equality operator (x == y) tests the values of x and y for equality. In the case of lists and tuples, all the elements are compared and evaluated as true if they're of equal value. For dictionaries, a true value is returned only if x and y have the same set of keys and all the objects with the same key have equal values. Two sets are equal if they have the same elements, which are compared using equality (==).

The identity operators (x is y and x is not y) test two objects to see whether they refer to the same object in memory. In general, it may be the case that x == y, but x is not y.

Comparison between objects of noncompatible types, such as a file and a floating-point number, may be allowed, but the outcome is arbitrary and may not make any sense. In addition, comparison between incompatible types may result in an exception.

Order of Evaluation

Table 4.2 lists the order of operation (precedence rules) for Python operators. All operators except the power (**) operator are evaluated from left to right and are listed in the table from highest to lowest precedence. That is, operators listed first in the table are evaluated before operators listed later. (Note that operators included together within subsections, such as x * y, x / y, x // y, and x % y, have equal precedence.)

Table 4.2 Order of Evaluation (Highest to Lowest)

Operator	Name
`(...), [...], {...}`	Tuple, list, and dictionary creation
`´...´`	String conversion
s`[`i`],` s`[`i`:`j`]`	Indexing and slicing
s`.`attr	Attributes
f`(...)`	Function calls
`+`x`, -`x`, ~`x	Unary operators
x `**` y	Power (right associative)
x `*` y`,` x `/` y`, x // y,` x `%` y	Multiplication, division, floor division, modulo
x `+` y`,` x `-` y	Addition, subtraction
x `<<` y`,` x `>>` y	Bit-shifting
x `&` y	Bitwise and
x `^` y	Bitwise exclusive or
x `\|` y	Bitwise or
x `<` y`,` x `<=` y`,` x `>` y`,` x `>=` y`,`	Comparison, identity, and sequence membership tests
x `==` y`,` x `!=` y
x `<>` y
x `is` y`,` x `is not` y
x `in` s`,` x `not in` s
`not` x	Logical negation
x `and` y	Logical and
x `or` y	Logical or
`lambda` args`:` expr	Anonymous function

[previous]

Created: March 27, 2003
Revised: March 13, 2006

URL: https://webreference.com/programming/python/1