Unicode - Decoding

Decoding

  • Conversion of raw byte-streams back into Unicode strings
  • unicode(s, [encoding [,errors]])
     >>> e = 'H\000e\000l\000l\000o\000'
     >>> unicode(e,'utf-16-le')
     u'Hello'
     >>> unicode('hello', 'utf-8')
     u'Hello'
     >>>
  • Of course, to properly decode a string, you need to know what encoding was used
  • Usually, this is obtained elsewhere (e.g., MIME header)
     Content-type: text/plain
     Encoding: utf-8
<<< O'Reilly OSCON 2001, New Features in Python 2, Slide 53
July 26, 2001, beazley@cs.uchicago.edu
>>>