Unicode - External Representation

The Encoding Problem

  • Often have to read Unicode strings from files and other byte-streams
  • What data encoding do you use?
     a = u"Hello"       # Unicode: 0045 0065 006c 006c 006f
     f = open("foo","w")
     f.write(a)         # ???? 
  • Little endian encoding (least significant byte first)
     45 00 65 00 6c 00 6c 00 6f 00
  • Big endian encoding (most significant byte first)
     00 45 00 65 00 6c 00 6c 00 6f
  • Variable length encoding? (ex. UTF-8)
     45 65 6c 6c 6f
<<< O'Reilly OSCON 2001, New Features in Python 2, Slide 51
July 26, 2001, beazley@cs.uchicago.edu
>>>