In a previous post, had a done a simple demonstration of Unicode decomposition and normalization with the latin character ä. This time I will do the same demonstration with non-latin characters beyond the range of 255.
To the interpreter!
We'll need to insert a couple screenshots to make sure the Malayalam characters I'm using show up.
>>> import unicodedata
>>> for ltr in malayalamword[:4]:
print('{0:<30} {1:>#6x} {2}'.format(
unicodedata.name(ltr),
ord(ltr), ltr))
>>> # decompose fourth letter (vowel sign O)
>>> unicodedata.decomposition(chr(0xd4a))
'0D46 0D3E'
>>> # make identifier
>>> # use letter RA to prevent error
>>> exec(chr(0xd30) + chr(0xd4a) +
' = 22')
>>> eval(chr(0xd30) + chr(0xd4a))
22
>>> # attempt to use eval with decomposition
>>> eval(chr(0xd30) + chr(0xd46) +
chr(0xd3e))
22
>>> # same
>>> # now set identifier with decomposed char
>>> exec(chr(0xd30) + chr(0xd46) +
chr(0xd3e) + ' = 44')
>>> eval(chr(0xd30) + chr(0xd46) +
chr(0xd3e))
44
>>> eval(chr(0xd30) + chr(0xd4a))
44
>>> # find representation of identifier
>>> localsx = locals()
>>> localidentifiers = [idx for idx
in localsx]
>>> localidentifiers.sort()
>>> localidentifiers.reverse()
>>> ord(localidentifiers[0][1])
3402
>>> hex(3402)
'0xd4a'
>>> # normalizes to single character
Notes:
1) I had to use the letter RA to start the identifier; the vowel sign is a combination character and cannot be used to start an identifier.
2) Armed with a little knowledge of Unicode, I find the unicodedata module quite handy. I owe a lengthly post to this module and its author, but I don't quite have a handle on its full scope yet.
No comments:
Post a Comment