Monday, October 5, 2009

Python 3.0/3.1 and Unicode Identifiers

Lately, I've been attempting to update the Python Wiki's non-English language pages with code snippets that highlight Python 3.0 and Python 3.1's ability to host non-ascii (Unicode/UTF-8 encoded) identifiers. Here's an example from the Armenian page (http://wiki.python.org/moin/ArmenianLanguage):

# python 3.0/3.1
# English
to_say = ['hello',
          'Good morning',
          'how are you']
# common Armenian name
name = ['Ashot', 'Armen', 'Anahit']
for namex in name:
    for greeting in to_say:
        print(greeting + ' ' + namex)

# Հայերեն լեզու
ասել = ['Ողջույն',
        'Բարի օր',
        'Ինչպե՞ս եք']
# տարածված Հայկական անուն
անուն = ['Աշոտ', 'Արմեն', 'Անահիտ']
for անունx in անուն:
    for Ողջույնx in ասել:
        print(Ողջույնx + ' ' + անունx)


I've saved this code snippet in a file called armenian.py, which I'll make use of shortly.

After getting the Python Wiki set up with a handful of these, it occurred to me that I hadn't actually run any of these scripts. My first attempts didn't work out so well. I was trying to run the scripts out of Konsole on KDE under FreeBSD 7.2. After googling and taking a look at the Konsole handbook, I came up empty (I suspect a good how to is in both of these sources; I just couldn't find it in a time efficient manner.) I did note, however, that I could get the non-ascii characters to show up in idle, the default Python editor, on Windows.

It could be my ports collection is out of date, but I did not have access to tkinter (required to run idle) for Python 3.0 on the FreeBSD machine. I downloaded the Python3.1 source from python.org and compiled it outside the ports system. Lo and behold, I had access to the Unicode characters within the Python interpreter.

Inside idle:

Python 3.1.1 (r311:74480, Oct 3 2009, 21:55:31)
[GCC 4.2.1 20070719 [FreeBSD]] on freebsd7
Type "copyright", "credits" or "license()" for more information.
>>> import armenian
hello Ashot
Good morning Ashot
how are you Ashot
hello Armen
Good morning Armen
how are you Armen
hello Anahit
Good morning Anahit
how are you Anahit
Ողջույն Աշոտ
Բարի օր Աշոտ
Ինչպե՞ս եք Աշոտ
Ողջույն Արմեն
Բարի օր Արմեն
Ինչպե՞ս եք Արմեն
Ողջույն Անահիտ
Բարի օր Անահիտ
Ինչպե՞ս եք Անահիտ
>>>
 

Well, if that isn't beautiful, I don't know what is.

There still remains the problem of typing the text in (the mini armenian.py program was copied from a webpage). I found gvim and its help files useful here.

What worked for me in gvim:

:se encoding=utf-8
:se gfn=-misc-fixed-medium-r-normal--18-120-100-100-c-90-iso10646-1


To find available keyboard layouts:

:echo globpath(&rtp, "keymap/*.vim")

I saw one called russian-yawerty (presumably this is similar to querty - worth a try):

:se keymap=russian-yawerty

Вот карандаш

:se keymap= 

takes me back to the default (US English).

This was enough to get me started. I hope to get more proficient and learn more as I get more experience with this.

Notes:

1) Hasmik of CalPoly was kind enough to provide me with the Armenian presented here. Many thanks.

2) I think the Russian phrase means "Here is a pencil."

3) Armenian doesn't use a question mark. Instead the character that appears as a superscript in the first word of the Armenian phrase for How are you? serves this purpose.

No comments:

Post a Comment