Unicode

Unicode Literals

  • Precede string literal with u
     a = "Hello"      # 8-bit string   : 45 65 6c 6c 6f
     b = u"Hello"     # Unicode string : 0045 0065 006c 006c 006f
  • Specifying Unicode characters. Use \uxxxx.
     c = u"\u10f2\u0455"
     d = u"\u0045\u0065\u006c\u006c\u006f"
  • Raw Unicode strings
     e = ur"M\u00fcller\n"   

Comments

  • Python source files are 8-bit ASCII.
  • Unicode string literals are not written in any special encoding (UTF-8, UTF-16, etc.)
  • Raw strings a little strange. All escape codes uninterpreted except for \u.
  • -U option makes all string literals Unicode (the default in the future??)
<<< O'Reilly OSCON 2001, New Features in Python 2, Slide 50
July 26, 2001, beazley@cs.uchicago.edu
>>>