Last post I took on a fairly well established language for Python identifiers and UTF-8 encoded text: Armenian (with a little Russian thrown in at the end).
This time I'll relate something a bit more challenging: the southeast Asian language Lao.
Before I get started, a quick recap:
1) Python version 3.0 and higher can handle Unicode identifiers and UTF-8 encoded source.
2) Getting non-ASCII characters to show up correctly, from least difficult medium to most difficult:
a) Web Browser
b) Desktop Application
Lao is a bit of an exception, at least the way I have things set up. The GUI (idle) with some tweaking, works pretty well. The Python Wiki, not so much.
Here's the LaoLanguage page on the Python Wiki (in Opera on my laptop):
Yuck! It's supposed to be seven characters wide (not eleven) and what we in the West would call accent marks are supposed to sit above the character to be modified, not northeast of them.
Well, let's try it in idle. After some messing around with the font I discovered that Courier 10 point size 14 seems to yield the best results:
Not perfect, but not bad either. This is what it's supposed to look like from laoconnection:
From here on out we should be able to dispense with the screen shots. I assigned the variable laostring the text value I pasted in to idle. Now we can inspect it a little bit:
>>> # only 7 character widths
>>> import unicodedata
>>> for charx in laostring:
LAO LETTER SO TAM
LAO VOWEL SIGN YY
LAO TONE MAI EK
LAO LETTER KHO SUNG
LAO LETTER O
LAO TONE MAI THO
LAO LETTER YO
LAO VOWEL SIGN EI
LAO LETTER MO
LAO TONE MAI EK
LAO LETTER NO
The Unicode Standard doesn't handle all languages the same way. Whereas "ä" takes only one codepoint, the single character (sort of) LAO LETTER SO TAM, LAO VOWEL SIGN YY, LAO TONE MAI EK takes up three.
Stuff like this is second nature to people working with East Asian languages daily. For me, it takes some getting used to, especially the string length thing. Rendering is an entirely different problem, but I am hoping that with practice and experimentation, I'll get a better handle on what works. Suggestions welcome.