Unicode - I/O
codecs example: Autodetection of XML encoding
- XML document starts with
- Encoding can be determined by looking at first few bytes of input
3C 3F 78 6D # UTF-8, ASCII, Latin-1
3C 00 3F 00 # UTF-16-LE
00 3C 00 3F # UTF-16-BE
...
- Use of codec
encodings = {
'\x3c\x3f\x78\x6d' : 'utf-8',
'\x3c\x00\x3f\x00' : 'utf-16-le',
'\x00\x3c\x00\x3f' : 'utf-16-be' }
f = open("foo.xml")
reader = codecs.lookup(encodings[f.read(4)])[2]
fr = reader(f)
...
|