Unicode
Python 2.0 provides support for Unicode strings
- Needed for internationalization, XML, etc.
Unicode: In a nutshell
- Internally, all character values are extended to 16 bit integers (a C short or wchar_t).
- Character values 0-127 represent the same characters as 8-bit ASCII.
- Otherwise, everything is about the same (well, mostly).
Issues
- How do you specify Unicode strings in a program? (You can't type most of the characters)
- External representation and I/O.
- Compatibility with 8-bit strings (comparison, coercion, regular expressions, etc.)
Note:
- When discussing Unicode, U+xxxx used to indicate a Unicode character value.
- Ex: U+006A
- This is not python syntax
|