Unicode - Encodings

Encodings

  • Python provides the following string encoding types
     'ascii'                   # 7-bit ASCII (0-127)
     'latin-1', 'iso-8859-1'   # 8-bit extended ASCII (0-255)
     'utf-8'                   # 8-bit variable length encoding
     'utf-16'                  # 16-bit variable length encoding
     'utf-16-le'               # 16-bit little endian
     'utf-16-be'               # 16-bit big endian
     'unicode-escape'          # Format used in u"xxxxx" literals
     'raw-unicode-escape'      # Format used in ur"xxxxx" literals
  • To encode: s.encode([encoding [,errors]])
     >>> s = u"Hello"
     >>> s.encode('utf-8')
     'Hello'
     >>> s.encode('utf-16-le')
     'H\000e\000l\000l\000o\000'
     >>> s.encode('utf-16-be')
     '\000H\000e\000l\000l\000o'
<<< O'Reilly OSCON 2001, New Features in Python 2, Slide 52
July 26, 2001, beazley@cs.uchicago.edu
>>>