pyright

Tuesday, March 9, 2010

Attempt at an RTL Editor for Python

In previous posts, I've written about the possibilities offered by Python 3.1's Unicode identifier capability, as well as the new challenges posed when one tries to display them on screen.

As the final project for my Java course, I set about trying to create an editor that would allow the user to enter text right to left, but save it as valid interpretable (is that a word?) Python 3.x code:

There's still a good deal of work to be done on this, but this is a start. Hopefully I'll have something more robust next time.

Wednesday, February 24, 2010

Unicode Poster From Pycon 2010 Up

I posted my poster to Slideshare in Open Office Presentation format. The file is about 13 or 14 megs in size. The embedded poster was in png format to preserve the shape of the foreign glyphs.

A number of people asked me at the poster session for sources. Here are the main ones:

1) Wikipedia - perhaps not the ultimate authority on all things, but a good place to research foreign languages and scripts.

2) the O'Reilly book Fonts and Encodings by Haralambous. If you know little about Unicode and fonts, this is the next best thing to Knuth.

3) the Python 3.1 interpreter and the unicodedata module. Once you get the basics of Unicode down, the unicodedata module has most of what you'll need.

4) Google searches and language promotion websites - laoconnection is a site that comes to mind. Most people are proud of their languages and culture and want to share them.

Thanks to everyone who stopped by the poster. That was fun.

Tuesday, February 23, 2010

FOSS Conference Economics

Just got back from Pycon - great show.

I've had some time to reflect on how to make conference going affordable, and where my money goes. This year I was partially funded by my employer. I was quite grateful, as I wasn't expecting anything.

What has concerned me in the past is the amount of money put out on travel and hotels. If you're inside the US, Pycon(US) will see the biggest chunk of your money going to the hotel. This is where people (or at least me) say, "Hey, wait a second, all my monetary support for the Open Source Software movement is going to the hotel industry!" Not so fast - actually, although your money doesn't support FOSS directly, it does keep it from *losing* money. To secure a hotel/convention facility for more than 1000 people, there has to be a commitment on rooms. I've seen other devs stay at cheaper hotels for conferences - this is a good approach, if it's done out of necessity. I generally try to stay at the conference hotel in order to support the continued success of the conference - to make sure the conference doesn't lose money.

The travel argument goes roughly the same way - you can't have a conference if people don't show. Even though most of your money is going to the airlines (in the case of Pycon(US) for those outside the United States), your attendance is a plus.

Sunday, February 7, 2010

Handling UnicodeEncodeError in the Console (Python 3.1)

I've been working with a lot of different foreign scripts for the past six months or so. Ideally I like to work in the console where possible. An error that always comes up is the following:

[carl@pcbsd]/home/carl(139)% python3.1
Python 3.1.1 (r311:74480, Jan 17 2010, 23:15:26)
[GCC 4.2.1 20070719 [FreeBSD]] on freebsd7
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\u0400')
Traceback (most recent call last):
File "", line 1, in
UnicodeEncodeError: 'ascii' codec can't encode character '\u0400' in position 0: ordinal not in range(128)
>>>

After a while this can get pretty annoying. There's a number of ways to get around the problem. I don't know much about most of the languages I'm dealing with, so I prefer the Unicode code charts' capitalized ASCII descriptions to glyphs or empty boxes. Fortunately the unicodedata module has all this information available.

To get the output I wanted I came up with a little script:

# mockprint.py - wrapper around print
# function to handle
# UnicodeEncoding errors

# python 3.1

import unicodedata

ERRORSTR = "'ascii' codec can't encode character "
CHARIDX = 5
POSITIDX = 8
POSITIDX2 = 7

def mockprint(stringx):
"""
Wrapper for print() function that
replaces unprintable characters
with their Unicode names.
"""
try:
print(stringx)
except UnicodeEncodeError as e:
# main cases:
# 1) one character can't be printed
# 2) multiple characters in a row can't be printed
# 3) unicode character is first or last in string
# 4) other ascii characters surround the unicode ones
reasonx = str(e)
reasonx = reasonx.split(' ')
idx = reasonx[POSITIDX]
# more than 1 char in a row can't be printed
if idx == 'ordinal':
idx = int(reasonx[POSITIDX2][0])
if idx != 0:
print(stringx[:idx])
print(unicodedata.name(stringx[idx]))
mockprint(stringx[(idx + 1):])
# offending character shows up after ascii chars
elif len(stringx) > 1:
charx = int(reasonx[CHARIDX][3:-1], 16)
charx = chr(charx)
print(unicodedata.name(charx))
mockprint(stringx[(int(idx[0]) + 1):])
# end of the line
elif len(stringx) == 1:
charx = int(reasonx[CHARIDX][3:-1], 16)
charx = chr(charx)
print(unicodedata.name(charx))

A quick demo:

>>> import mockprint
>>> mockprint.mockprint('hello\u0401\u0402\u0403\u0404world')
hello
CYRILLIC CAPITAL LETTER IO
CYRILLIC CAPITAL LETTER DJE
CYRILLIC CAPITAL LETTER GJE
CYRILLIC CAPITAL LETTER UKRAINIAN IE
world

And something a bit more challenging:

A few foreign words in a number of different languages.

>>> fle = open('/home/carl/pythonblog/foreignbytestest', 'rt', encoding = 'UTF-8')
>>> import mockprint
>>> for linex in fle.readlines():
...     mockprint.mockprint(linex)
...
CJK UNIFIED IDEOGRAPH-65E5
CJK UNIFIED IDEOGRAPH-672C
CJK UNIFIED IDEOGRAPH-8A9E

abcde

ETHIOPIC SYLLABLE GLOTTAL A
ETHIOPIC SYLLABLE MAA
ETHIOPIC SYLLABLE RE
ETHIOPIC SYLLABLE NYAA

ARMENIAN CAPITAL LETTER HO
ARMENIAN SMALL LETTER AYB
ARMENIAN SMALL LETTER YI
ARMENIAN SMALL LETTER ECH
ARMENIAN SMALL LETTER REH
ARMENIAN SMALL LETTER ECH
ARMENIAN SMALL LETTER NOW

ORIYA LETTER O
ORIYA LETTER DDA
ORIYA SIGN NUKTA
ORIYA VOWEL SIGN I
ORIYA LETTER AA

LAO LETTER PHO TAM
LAO VOWEL SIGN AA
LAO LETTER SO SUNG
LAO VOWEL SIGN AA
LAO LETTER LO LOOT
LAO VOWEL SIGN AA
LAO LETTER WO

CYRILLIC SMALL LETTER ER
CYRILLIC SMALL LETTER U
CYRILLIC SMALL LETTER ES
CYRILLIC SMALL LETTER ES
CYRILLIC SMALL LETTER KA
CYRILLIC SMALL LETTER I
CYRILLIC SMALL LETTER SHORT I

CYRILLIC SMALL LETTER YA
CYRILLIC SMALL LETTER ZE
CYRILLIC SMALL LETTER YERU
CYRILLIC SMALL LETTER KA

Well, if that isn't beautiful, I don't know what is.

Seriously, this is a hack - parsing an error string and working backwards? I've got to be joking. Actually, no. For as much time as I've spent remembering after the fact that I can't print Unicode in the console, this is worth it, even if it's only good for Python 3.1.

Tuesday, February 2, 2010

py-openbsd's DoubleAssociation

I briefly covered this structure last time, but didn't do it justice. The idea of a two-way dictionary structure (keys and values are both keys) intrigued me. I wanted to give it a spin with a real world example.

I've chosen a simple example with a few domain name (common names) and ip addresses:

# dblassoc.py

import openbsd

# some ip addrsses paired with domains
ips = {'google':(0x4a7d1393, 0xd8ef3d68),
       'openbsd':(0x8ef40c2a,),
       'freebsd':(0x45935321,),
       'yahoo':(0xd1bf5d34, 0xd183249e)}

# OK, we can now make the DoubleAssociation
ipsbothways = openbsd.utils.DoubleAssociation(ips)

print "ipsbothways['yahoo'] = " + str(ipsbothways['yahoo'])

# fair enough, but nothing we couldn't get from the dictionary

# try to query on an ip address to get a domain name
print "ipsbothways[(0x8ef40c2a,)] = " + ipsbothways[(0x8ef40c2a,)]

# unlike a normal dictionary, DoubleAssociation gives everything
# back with the keys() method

for keyx in ipsbothways.keys():
    try:
        if 0xd183249e in keyx:
            print "domain is " + ipsbothways[keyx]
            break
    except TypeError:
        print "TypeError: " + keyx

Python 2.5.4 (r254:67916, Jul 1 2009, 11:37:21)
[GCC 3.3.5 (propolice)] on openbsd4
Type "help", "copyright", "credits" or "license" for more information.
>>> import dblassoc
ipsbothways['yahoo'] = (3518979380L, 3515032734L)
ipsbothways[(0x8ef40c2a,)] = openbsd
TypeError: google
TypeError: openbsd
TypeError: yahoo
domain is yahoo
>>>

The one thing you have to look out for is the treatment of everything in the structure as a key - that's why I had to catch the TypeError. Everything is a value, too. The values and keys methods yield the same results.

In real life, if you had 30 or 50 or 1000 ip addresses, this would come in handy. Likewise for doctor-patient records, etc. (although the grouping of patients has to be unique, so that may not work after all - best to test both "sides" of the structure for exclusivity).

Sunday, January 31, 2010

OpenBSD and Python

Last time we covered FreeBSD's third party module, freebsd; this time we'll take a quick look at the equivalent openbsd package for the OpenBSD operating system.

$ python2.5
Python 2.5.4 (r254:67916, Jul 1 2009, 11:37:21)
[GCC 3.3.5 (propolice)] on openbsd4
Type "help", "copyright", "credits" or "license" for more information.
>>> import openbsd
>>> dir(openbsd)
['__builtins__', '__doc__', '__file__', '__name__', '__path__', '_ifconfig', '_netstat', '_packetDescriptors', '_pcap', '_sysvar', 'arc4random', 'ifconfig', 'netstat', 'packet', 'pcap', 'utils']

Let's see what all is hidden in that utils item:

>>> dir(openbsd.utils)
['DoubleAssociation', '__builtins__', '__doc__', '__file__', '__name__', 'cksum16', 'ethToBytes', 'ethToStr', 'findLongestSubsequence', 'getBlocks', 'ip6FromPrefix', 'ip6ToBytes', 'ip6ToStr', 'ipFromPrefix', 'ipToBytes', 'ipToStr', 'isIP6Addr', 'isIPAddr', 'isStringLike', 'multichar', 'multiord']

OK, a fair number of network addressing related functions.

help(openbsd.utils.ipFromPrefix)

ipFromPrefix(prefix)
    Produce an IPv4 address (netmask) from a prefix length.

That sounds handy. Let's give it a shot:

>>> openbsd.utils.ipFromPrefix(24)
'255.255.255.0'

>>> help(openbsd.utils.DoubleAssociation)

Help on class DoubleAssociation in module openbsd.utils:

class DoubleAssociation(__builtin__.dict)
| A double-association is a broadminded dictionary - it goes both ways.
|
| The rather simple implementation below requires the keys and values to
| be two disjoint sets. That is, if a given value is both a key and a
| value in a DoubleAssociation, you get unexpected behaviour.
|
| Method resolution order:
|      DoubleAssociation
|      __builtin__.dict
|      __builtin__.object
|
| Methods defined here:
|
| __init__(self, idict=None)
|      # FIXME:
|      #   While DoubleAssociation is adequate for our use, it is not entirely complete:
|      #       - Deletion should delete both associations
|      #       - Other dict methods that set values (eg. setdefault) will need to be over-ridden.

This one is kind of interesting - let's have a look:

>>> d = {1:'a', 2:'b', 3:'c'}
>>> d.get(1)
'a'
>>> print d.get('a')
None
>>> da = openbsd.utils.DoubleAssociation(d)
>>> da.get(1)
'a'
>>> da.get('a')
1

Just like the doc described it. Both the keys and the values are keys, if that makes sense.

Back up to the main modules of the openbsd package:

>>> help(openbsd.arc4random)

NAME
    openbsd.arc4random

FILE
    /usr/local/lib/python2.5/site-packages/openbsd/arc4random.so

FUNCTIONS
    getbytes(...)
        Get some random bytes.

And the result -

>>> bytesx = openbsd.arc4random.getbytes(10)
>>> [bytex for bytex in bytesx]
['\xb4', '\xd1', '\x86', '\xb7', 'g', '8', '\x10', '}', '\x8b', '\xe5']

One last module on a more common theme:

NAME
    openbsd.ifconfig - A Python module for querying and manipulating network interfaces.

FILE
    /usr/local/lib/python2.5/site-packages/openbsd/ifconfig.py

CLASSES
    __builtin__.int(__builtin__.object)
        FlagVal
    __builtin__.object
        Flags
        IFConfig
        Interface
        MTU
        Media
        Metric
    exceptions.Exception(exceptions.BaseException)
        _ifconfig.IfConfigError

    class FlagVal(__builtin__.int)
     | Method resolution order:
(etc.)


>>> intx = openbsd.ifconfig.Interface('rl0')
>>> print intx
rl0: flags=8843 mtu 1500
         media: Ethernet autoselect
         link: 00:30:bd:72:6a:a0
         inet6: fe80:2::230:bdff:fe72:6aa0
         inet: 192.168.100.100
>>> dir(intx)
['Iftype', 'Name', '__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', '__weakref__', '_addrToStr', '_addrTypeLookup', '_getAddresses', '_getinfo', '_setflags', '_setmetric', '_setmtu', 'addAddress', 'addresses', 'delAddress', 'flags', 'media', 'metric', 'mtu', 'setAddress']
>>> intx.media
media: Ethernet autoselect
>>> intx.addresses
[{'address': {'sa_family': 18L, 'iftype': 'ETHER', 'address': '00:30:bd:72:6a:a0'}}, {'netmask': {'sa_family': 24L, 'address': 'ffff:ffff:ffff:ffff::'}, 'address': {'sa_family': 24L, 'address': 'fe80:2::230:bdff:fe72:6aa0'}}, {'netmask': {'sa_family': 0L, 'address': None}, 'dstaddr': {'sa_family': 2L, 'address': '192.168.100.255'}, 'address': {'sa_family': 2L, 'address': '192.168.100.100'}}]
>>>

ifconfig available within Python - sweet. rl0 is the ethernet device on my old Dell tower.

Examination of the openbsd package shows that it has quite a bit to offer. If you're using OpenBSD, there's nothing stopping you from doing routine sysadmin tasks with Python. If not, now you've got a reason to check it out.

Python Modules for the BSD's

Well, for FreeBSD and OpenBSD, at least. I can't yet vouch for NetBSD and Dragonfly BSD.

First, FreeBSD - the port is named py-freebsd. Once built, the module can be imported with "import freebsd".

[carl@pcbsd]/usr/local/lib/python2.6/site-packages(158)% python
Python 2.6.2 (r262:71600, Jun 24 2009, 23:31:28)
[GCC 4.2.1 20070719 [FreeBSD]] on freebsd7
Type "help", "copyright", "credits" or "license" for more information.
>>> import freebsd
>>> dir(freebsd)
['__doc__', '__file__', '__name__', '__package__', '__version__', 'chflags', 'const', 'fchflags', 'fstatfs', 'geom_getxml', 'getfsent', 'getfsfile', 'getfsspec', 'getfsstat', 'gethostname', 'getloadavg', 'getlogin', 'getosreldate', 'getpriority', 'getprogname', 'getpwent', 'getpwnam', 'getpwuid', 'getquota', 'getrlimit', 'getrusage', 'ifstats', 'ipstats', 'jail', 'kevent', 'kqueue', 'ktrace', 'lchflags', 'quotaoff', 'quotaon', 'quotasync', 'reboot', 'sendfile', 'sethostname', 'setlogin', 'setpriority', 'setproctitle', 'setprogname', 'setquota', 'setrlimit', 'statfs', 'sysctl', 'sysctldescr', 'sysctlmibtoname', 'sysctlnametomib', 'tcpstats', 'udpstats']

Not a bad collection of utilities. Let's take a couple for a test drive:

>>> freebsd.gethostname()
'pcbsd'

>>> freebsd.getprogname()
'python'
>>> help(freebsd.jail)
Help on built-in function jail in module freebsd:

jail(...)
jail(path, hostname, ip_number):
The jail() system call sets up a jail and locks the current process
in it. The ``path'' should be set to the directory which is to be
the root of the prison. The ``hostname'' can be set to the hostname
of the prison. This can be changed from the inside of the prison.
The ``ip_number'' can be set to the IP number assigned to the prison.

>>> # wow, you can set up a jail with python

>>> freebsd.ifstats()
>>> >>> import pprint
>>> pprint.pprint(_)
{'bge0': {'addrlen': 6,
'baudrate': 100000000L,
'collisions': 0L,
'flags': 34883,
'hdrlen': 14,
'hwassist': 7L,
'ibytes': 19222590L,
'ierrors': 0L,
'imcasts': 577L,
'ipackets': 19728L,
'iqdrops': 0L,
'metric': 0L,
'mtu': 1500L,
'name': 'bge0',
'noproto': 0L,
'obytes': 2009038L,
'oerrors': 0L,
'omcasts': 0L,
'opackets': 13285L,
'pcount': 0,
'physical': 0,
'snd_drops': 0,
'snd_len': 0,
'snd_maxlen': 511,
'type': 6},

bge0 is the ethernet device on my Thinkpad.

>>> freebsd.getlogin()
'carl'
>>> freebsd.tcpstats()
>>> pprint.pprint(_)
{'accepts': 0L,
'badsyn': 0L,
'cachedrtt': 147L,
'cachedrttvar': 150L,
'cachedssthresh': 4L,
'closed': 495L,
'connattempt': 360L,
'conndrops': 20L,
'connects': 340L,
'delack': 277L,
'drops': 22L,
'keepdrops': 0L,
'keepprobe': 0L,
'keeptimeo': 0L,
'listendrop': 0L,
'mturesent': 0L,
'pawsdrop': 0L,
'persistdrop': 0L,
'persisttimeo': 0L,
'predack': 0L,
'preddat': 15226L,
'rcvackbyte': 1093284L,
'rcvackpack': 1848L,
'rcvacktoomuch': 0L,
'rcvafterclose': 7L,
'rcvbadoff': 0L,
'rcvbadsum': 0L,
'rcvbyte': 16595286L,
'rcvbyteafterwin': 0L,
'rcvdupack': 232L,
'rcvdupbyte': 88723L,
'rcvduppack': 77L,
'rcvoobyte': 1015050L,
'rcvoopack': 919L,
'rcvpack': 15882L,
'rcvpackafterwin': 0L,
'rcvpartdupbyte': 525L,
'rcvpartduppack': 2L,
'rcvshort': 0L,
'rcvtotal': 18489L,
'rcvwinprobe': 0L,
'rcvwinupd': 3L,
'rexmttimeo': 118L,
'rttupdated': 1817L,
'sc_aborted': 0L,
'sc_added': 0L,
'sc_badack': 0L,
'sc_bucketoverflow': 0L,
'sc_cacheoverflow': 0L,
'sc_completed': 0L,
'sc_dropped': 0L,
'sc_dupsyn': 0L,
'sc_recvcookie': 0L,
'sc_reset': 0L,
'sc_retransmitted': 0L,
'sc_sendcookie': 0L,
'sc_stale': 0L,
'sc_unreach': 0L,
'sc_zonefail': 0L,
'segstimed': 1688L,
'sndacks': 9261L,
'sndbyte': 1098259L,
'sndctrl': 697L,
'sndpack': 1252L,
'sndprobe': 0L,
'sndrexmitbyte': 2252L,
'sndrexmitpack': 2L,
'sndtotal': 12381L,
'sndurg': 0L,
'sndwinup': 1169L,
'timeoutdrop': 9L}

22 drops, 9 of them timeouts, and a bunch of other stuff too.

Enough for today. Next time we'll take a quick look at the Python module for OpenBSD.