Monday, December 21, 2009

More About Python 3.1 Unicode Identifiers

In a previous post, had a done a simple demonstration of Unicode decomposition and normalization with the latin character  ä.  This time I will do the same demonstration with non-latin characters beyond the range of 255.

To the interpreter!

We'll need to insert a couple screenshots to make sure the Malayalam characters I'm using show up.







>>> import unicodedata
>>> for ltr in malayalamword[:4]:
    print('{0:<30} {1:>#6x} {2}'.format(
        unicodedata.name(ltr),
        ord(ltr), ltr))












>>> # decompose fourth letter (vowel sign O)
>>> unicodedata.decomposition(chr(0xd4a))
'0D46 0D3E'
>>> # make identifier
>>> # use letter RA to prevent error
>>> exec(chr(0xd30) + chr(0xd4a) +
         ' = 22')
>>> eval(chr(0xd30) + chr(0xd4a))
22
>>> # attempt to use eval with decomposition
>>> eval(chr(0xd30) + chr(0xd46) +
         chr(0xd3e))
22
>>> # same
>>> # now set identifier with decomposed char
>>> exec(chr(0xd30) + chr(0xd46) +
         chr(0xd3e) + ' = 44')
>>> eval(chr(0xd30) + chr(0xd46) +
         chr(0xd3e))
44
>>> eval(chr(0xd30) + chr(0xd4a))
44
>>> # find representation of identifier
>>> localsx = locals()
>>> localidentifiers = [idx for idx
            in localsx]
>>> localidentifiers.sort()
>>> localidentifiers.reverse()
>>> ord(localidentifiers[0][1])
3402
>>> hex(3402)
'0xd4a'
>>> # normalizes to single character

Notes:

1) I had to use the letter RA to start the identifier; the vowel sign is a combination character and cannot be used to start an identifier.

2) Armed with a little knowledge of Unicode, I find the unicodedata module quite handy.  I owe a lengthly post to this module and its author, but I don't quite have a handle on its full scope yet.

No comments:

Post a Comment