Sunday, December 26, 2010

POV-ray + Python - Scene Creation

Brief recap:  this recipe from the Python cookbook webpage is the basis for an api for POV-ray from Python.  I added an Object class to the api for more flexibility in manipulating pre-made POV-ray objects.

This final post in the POV-ray series deals with an attempt to create a scene.  I want to draw some pysanky eggs on a board with pegs in it for the eggs to stand on.

For the wood texture of the board I've used T_Wood10 from the POV-ray woods.inc library.  This requires the inclusion of the woods.inc library in the Python code:

file = pov.File('test5.pov', 'colors.inc', 'woods.inc')

test5.pov is my output file; colors.inc and woods.inc are my include files.

Because the api deals exclusively with text, the wood texture will have to be put in as a string literal complete with POV-ray's curly brackets:

testblock = pov.Box((-15, -6.50, -18), (15, -2.00, -3), 
    texture = '{T_Wood10}')

As for the scene itself:
This is good for illustrating the scene's design, but it's not very realistic.  The eggs look like they're wrapped in cellophane.  I tried introducing some focal blur:

cam = pov.Camera(location = (0, 5, -35), look_at = (0, 0, 2),
    focal_point = (0, 2, -10), aperture = 0.4, blur_samples = 20)

This looks more realistic, but blurry.  It would probably serve a remote corner of a scene better rather than the main object.

At this point it's really the nuances of POV-ray scene creation that will make a difference.  The purpose of this post and the last three was to show that you can do some fun things with POV-ray from Python.

Lastly, and just for fun, some wooden eggs (textures textures.inc/Cork, woods.inc T_Wood10 and T_Wood14):

Tuesday, December 21, 2010

Coding POV-ray in Python

My past two posts were basically on the same topic as this one.  The difference now is that I was able to write Python code as I would normally write it with functions, dictionaries, and loops and still come out with valid POV-ray code on the other end.  It's about a third of the length of the equivalent POV-ray code and, for me at least, easier to understand and read.

Here is the last part of the code that resulted in the scene of the three eggs below:

file = pov.File('test4.pov', 'colors.inc')
cam = pov.Camera(location = (-2, 3, -12), look_at = (0, 0, 2))
sorokolines = makesorokolines()testblock = makeredwedges()
testblockred = pov.Object(testblock, RED)
testblock = makeblackwedges()
testblockblack = pov.Object(testblock, BLACK)
test = pov.Union(testblockred, testblockblack, sorokolines)
test1 = pov.Object(test, translate = (2, 2, 0))
test2 = pov.Object(test, translate = (-2, -2, 0))
test3 = pov.Object(test, translate = (-6, 2, 0))
file.write(cam, test1, test2, test3, light1, light2, light3, light4, light5)

Now that I've proven to myself that I can code this in Python, I'd like to try something more involved like putting the eggs on a surface with a texture.

Thanks for having a look.

Monday, December 13, 2010

More POV-ray

Since last time I've had some success porting POV-ray code to Python code based on this recipe by Simon Burton.

To briefly recap, I'm trying to reproduce this egg with Python code:

Since my first attempt, this is how far I've gotten:
Basically, the lines; I've made them a bit thicker to emphasize them.  I'm still working on getting the camera and look at values set to coincide with the main axes.

The pictures are all well and good, but what's pretty exciting is that I made the second one using the API from the recipe.  Further, I was able to expand on the recipe by adding an Object class:

class Object(Item):
  def __init__(self, *opts, **kwargs):
    Item.__init__(self, "object", (), opts, **kwargs)

This was really simple; I just copied what Simon Burton had done with the other classes.  Still, it opened up a lot of possibilities for twirling, flipping, and coloring elements once they're constructed.  For example, the code for the dividing lines above is (I've skipped the egg shape code for brevity):

white = pov.Texture(pov.Pigment(color = (1.0, 1.0, 1.0)),
    pov.Finish(phong = redphng, reflection = redrflct))
horizontaldividingline = pov.Box((-3, 2.15, -3), (3, 2.25, 3), white)
verticaldividingline = pov.Box((-0.05, -1, -3), (0.05, 7, 3), white)

# vectors for dividing lines
# down on y axis
movedown = (0, -2.2, 0)
# up on y axis
mvup = (0, 2.2, 0)
# flip right around z axis
flprt = (0, 0, -52.5)

# scale - same in all directions (2)
scal = (2, 2, 2)

# turn for display
trn15back = (0, -15, 0)

planes = [horizontaldividingline, verticaldividingline]

# vertical planes
EIGHTHTURN = 45
turn = 45
for counter in range(3):
    plane = pov.Object(verticaldividingline, rotate = (0, turn, 0))
    planes.append(plane)
    turn += EIGHTHTURN

# for dividing lines at angle to horizontal
QUARTERTURN = 90
flprtdivlinepre = pov.Object(horizontaldividingline,
    translate = movedown, rotate = flprt)
flprtdivline = pov.Object(flprtdivlinepre, translate = mvup)

planes.append(flprtdivline)
turn = 90
for counter in range(3):
    plane = pov.Object(flprtdivline, rotate = (0, turn, 0))
    planes.append(plane)
    turn += QUARTERTURN

sorokoplanes = pov.Union(*planes)

eggwhite = pov.Object(unionegg, white, scale = scal)

sorokolines = pov.Intersection(eggwhite, sorokoplanes)

sorokotest = pov.Object(sorokolines, translate = (0, -2.75, 0), rotate = trn15back)

The egg design is very symmetrical and lends itself to repetition.  I tried to use this to my advantage with the two loops.  The list unpacking also compresses the code a bit.

Next on the agenda is placing code in functions and classes.  This would allow for making multiple eggs of different colors with a single code call.

Sunday, December 5, 2010

POV-ray

I messed around with POV-ray a bit about five years ago and recently tried to resurrect some of that code.

There's a recipe for a POV-ray - Python API by Simon Burton out on ActiveState that I wanted to try.

Here is the shape I was trying to re-create with the Python API:
Literally, an Easter egg, a bit involved, but not overly complex.  The egg shape is borrowed from Friedrich Lohmüller's POV-ray site.

There is a simple example in the API which I've slightly modified to make a partially lit sphere:

# renamed recipe as pypov
import pypov as pov

file = pov.File('test2.pov', 'colors.inc')
cam = pov.Camera(location = (0, 1, -5), look_at = (0, -0.5, 2))
sphere = pov.Sphere((0, 0, 0), 1.5, pov.Texture(pov.Pigment(color = 'Blue')))
light = pov.LightSource((2, 4, -3), color = 'White')
file.write(cam, sphere, light)

This, after its output is run through POV-ray, yields this:
It won't win any animation awards, but it's pretty nonetheless.

The code for Lohmüller's egg shape looks like this:


# renamed recipe as pypov
import pypov as pov

file = pov.File('test3.pov', 'colors.inc')
cam = pov.Camera(location = (0, 1, -5), look_at = (0, -0.5, 2))
sphereupper = pov.Sphere((0, 0, 0), 1.0, pov.Texture(pov.Pigment(color = 'Blue')), scale = (1, 1.55, 1))
slabupper = pov.Box((-1, -1.55, -1), (1, 0, 1))
diffupper = pov.Difference(sphereupper, slabupper)
spherelower = pov.Sphere((0, 0, 0), 1.0, pov.Texture(pov.Pigment(color = 'Blue')), scale = (1, 1.15, 1))
slablower = pov.Box((-1, 0, -1), (1, 1.15, 1))
difflower = pov.Difference(spherelower, slablower)
union = pov.Union(difflower, diffupper, translate = (0, 0.55, 0), scale = 1.0)
light = pov.LightSource((2, 4, -3), color = 'White')
light2 = pov.LightSource((-2, -4, -3), color = 'White')
file.write(cam, union, light, light2)





and the output looks like this:


That is about as far as I got with the Python API.  The problems I was having were related to trying to shoehorn my POV-ray code into the API.  I added an Object class for the purpose of assigning attributes to predefined shapes.  The problem there is that you can't use the same keyword more than once (translate, then  rotate, then translate again).



Going forward I plan to work with simpler shapes (merging two parts instead of 50 or so).  Also, I'll need to leverage what the API offers against working within its limitations.  It will not be a one to one code translation between POV-ray and Python.

Monday, October 25, 2010

Regular Expression Unicode Blocks in IronPython and jython

One last thing that's available in jython and IronPython, but not in CPython regular expressions is Unicode Blocks.  Blocks are similar to Unicode Scripts, but do not correspond one to one with them.  Blocks, as the name implies, represent continuous sequences of Unicode code points.  This page, recommended to me by artisonian on twitter, has a good synopsis.

Where Unicode Blocks are most useful (where they correspond best with Unicode Scripts) is in the South Asian languages (India and vicinity).  Here is some code written for the detection of Bengali characters in a string in IronPython and jython.  The syntax is similar.

Iron Python

/bin/mono /home/carl/IronPython-2.0.3/ipy.exe                                 <
IronPython 2.0.3 (2.0.0.0) on .NET 2.0.50727.1433
Type "help", "copyright", "credits" or "license" for more information.
>>> from System.Text import RegularExpressions as regex
>>> fle = open('bengalisnippet', 'r')
>>> linex = fle.readline()
>>> fle.close()
>>> rex = regex.Regex(r'\p{IsBengali}+')
>>> linex = linex.decode('utf-8')
>>> mtchx = rex.Match(linex)
>>> mtchx.ToString()
u'\u0995\u09bf\u099b\u09c1'
>>> mtchx.Success
True
>>>

jython

Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54)
[OpenJDK Client VM (Sun Microsystems Inc.)] on java1.7.0-internal
Type "help", "copyright", "credits" or "license" for more information.
>>> from java.util import regex
>>> rex = regex.Pattern.compile(r'\p{InBengali}+')
>>> fle = open('bengalisnippet', 'r')
>>> linex = fle.readline()
>>> linex = linex.decode('utf-8')
>>> mtchx = rex.matcher(linex)
>>> mtchx
java.util.regex.Matcher[pattern=\p{InBengali}+ region=0,5 lastmatch=]
>>> mtchx.find()
True
>>> mtchx.start()
0
>>> mtchx.end()
4
>>> linex
u'\u0995\u09bf\u099b\u09c1\n'
>>>

Saturday, October 23, 2010

IronPython, unicode, and regular expressions

This is a quick follow on to my last post on jython.

The basic idea is that .NET has the capability to search for characters belonging to Unicode general categories (in this case Mn for non-spacing character).

IronPython 2.0.3 (2.0.0.0) on .NET 2.0.50727.1433
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> from System.Text import RegularExpressions as regex
>>> nonspacingx = regex.Regex(r'\p{Mn}')
>>> ns = unichr(0x9C1)
>>> ns
u'\u09c1'
>>> nonspacingx.Match(ns)
<System.Text.RegularExpressions.Match object at 0x000000000000002B [?]>
>>> ns = u'a' + ns
>>> ns
u'a\u09c1'

>>> mtchx = nonspacingx.Match(ns)
<System.Text.RegularExpressions.Match object at 0x000000000000002C [?]>
>>> mtchx.ToString()
u'\u09c1'
>>> mtchx.Index
1
>>> mtchx.Length
1
>>> mtchx.Success
True
>>> 

Although the names are different, Java and .NET both provide a means of using general categories in regular expressions.  Match in .NET matches occurrences within the string, not just at the start.  Success is the boolean value indicating a match.

As an aside, the unicodedata module referenced in the jython post is available for IronPython.  It is not in the download for either IronPython or FePy, but is available as a separate download from the FePy site.                               

jython, regular expressions, and unicode

Jython enables access to Java's regular expression classes and methods.  One feature of Java's regular expression library that Python does not have is the ability to search on Unicode general categories (http://unicode.org/Public/UNIDATA/PropertyValueAliases.txt).  These are abbreviations:  Mn = non-spacing character, Lu = uppercase letter, etc.

Here is a quick example for Mn (non-spacing).

$ /usr/local/jdk-1.7.0/bin/java -jar jython.jar
Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54)
[OpenJDK Client VM (Sun Microsystems Inc.)] on java1.7.0-internal
Type "help", "copyright", "credits" or "license" for more information.
>>> from java.util import regex
>>> nonspacingx = regex.Pattern.compile(r'\p{Mn}')
>>> import unicodedata
>>> for charcode in range(0x900, 0xA00):
...     mtchx = nonspacingx.matcher(unichr(charcode))
...     if mtchx.matches():
...         print 'match at character %X, %s' % (charcode, unicodedata.name(unichr(charcode)))
...
match at character 901, DEVANAGARI SIGN CANDRABINDU
match at character 902, DEVANAGARI SIGN ANUSVARA
match at character 93C, DEVANAGARI SIGN NUKTA
match at character 941, DEVANAGARI VOWEL SIGN U
match at character 942, DEVANAGARI VOWEL SIGN UU
match at character 943, DEVANAGARI VOWEL SIGN VOCALIC R
match at character 944, DEVANAGARI VOWEL SIGN VOCALIC RR
match at character 945, DEVANAGARI VOWEL SIGN CANDRA E
match at character 946, DEVANAGARI VOWEL SIGN SHORT E
match at character 947, DEVANAGARI VOWEL SIGN E
match at character 948, DEVANAGARI VOWEL SIGN AI
match at character 94D, DEVANAGARI SIGN VIRAMA
match at character 951, DEVANAGARI STRESS SIGN UDATTA
match at character 952, DEVANAGARI STRESS SIGN ANUDATTA
match at character 953, DEVANAGARI GRAVE ACCENT
match at character 954, DEVANAGARI ACUTE ACCENT
match at character 962, DEVANAGARI VOWEL SIGN VOCALIC L
match at character 963, DEVANAGARI VOWEL SIGN VOCALIC LL
match at character 981, BENGALI SIGN CANDRABINDU
match at character 9BC, BENGALI SIGN NUKTA
match at character 9C1, BENGALI VOWEL SIGN U
match at character 9C2, BENGALI VOWEL SIGN UU
match at character 9C3, BENGALI VOWEL SIGN VOCALIC R
match at character 9C4, BENGALI VOWEL SIGN VOCALIC RR
match at character 9CD, BENGALI SIGN VIRAMA
match at character 9E2, BENGALI VOWEL SIGN VOCALIC L
match at character 9E3, BENGALI VOWEL SIGN VOCALIC LL
>>> 

That's nice, but how about a less contrived example.  I got a Bengali word off one of the links on the BengaliLanguage page on the Python Wiki.  The word is saved to a file bengalisnippet.
All I have to do is open the file, get the line and let my regex rip, right?

>>> fle = open('bengalisnippet', 'r')
>>> linex = fle.readline()
>>> linex = linex.decode('utf-8')
>>> mtchx = nonspacingx.matcher(linex)
>>> mtchx.matches()
False
>>>

Um, no.

Let's investigate and try this again.
>>> linex
u'\u0995\u09bf\u099b\u09c1\n'
>>> unicodedata.category(linex[0])
'Lo'
>>> unicodedata.category(linex[1])
'Mc'
>>> unicodedata.category(linex[2])
'Lo'
>>> unicodedata.category(linex[3])
'Mn'

OK, the character we're looking for is the last one (except for the return character).

>>> mtchx.find()
True
>>> mtchx.start()
3
>>> mtchx.end()
4
>>>

I was using the wrong method (matches).  find is analogous to search in Python.

For me the utility of this is being able to determine if characters are rendering correctly.  I can locate the trouble spots in an unfamiliar language's script and investigate them (combining and non-spacing characters don't always show up correctly).

Java regular expressions are a bit more involved than Python's.  This is one case where the extra effort required may be worth the trouble.
                                       

Monday, October 18, 2010

java.lang.String.matches method

Recently I've been working on learning regular expressions.  Something about the Java implementation (in jython) I found curious.

Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54)
[OpenJDK Client VM (Sun Microsystems Inc.)] on java1.7.0-internal
Type "help", "copyright", "credits" or "license" for more information.
>>> from java.lang import String
>>> teststring = String('def hello():')
>>> teststring.matches(r'\s*def\s+\w*\(\):$')
True
>>>  

Python has the re.match and re.search methods.  C# has something similar.  This just seemed like a strange, less efficient construct (presumably the regular expression gets interpreted on the fly instead of compiled).  Go figure.

Sunday, October 3, 2010

Second Javascript Attempt - the Zen of Python (again)

Last time, in my enthusiasm, I published some not ready for prime time html/JavaScript code.  Since then the W3C validator has helped me to see the error of my ways.  This is my second shot at making the first part of the Zen of Python magically appear in a web browser:




<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<HTML>
<HEAD>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
<TITLE>
The Zen of Python, by Tim Peters
</TITLE>
<SCRIPT TYPE="text/javascript">
<!--

// function to put keys into Array
getkeysx = function(hashx) {
var keysy = [];
for (keyz in hashx) {
keysy.push(keyz);
}
return keysy;
}

addheaderx = function(headerid) {
var dx = document.createElement("DIV");
dx.id = 'div' + headerid;
var hx = document.createElement("H2");
hx.id = headerid;
hx.style.textAlign = 'center';
hx.style.fontFamily = 'sans-serif';
document.getElementById('bodyx').appendChild(dx);
document.getElementById('div' + headerid).appendChild(hx);
document.getElementById(headerid).innerHTML = '';
}

var timex = new Date();
var objx = {timetowork:timex,
number1:'Beautiful is better than ugly.',
number2:'Explicit is better than implicit.',
number3:'Simple is better than complex.',
number4:'Complex is better than complicated.',
number5:'Flat is better than nested.',
number6:'Sparse is better than dense.',
number7:'Readability counts.'};

var INTERVALX = 3000;

addheaders = function() {
var i = 1;
for (keyn in objx) {
addheaderx('header' + i);
i++;
}
}

var keysx = getkeysx(objx);
var keytracker = 0;
var colortracker = 0;
colorsx = ['red', 'green', 'blue', 'black', 'indigo',
'deeppink', 'darkslategray',
'darkmagenta', 'darkturquoise']

var counter = 1;
// function to write key-value pair to text box
writeprop = function() {
objx.timetowork = Date();
if (keytracker >= keysx.length) {
for (var j = 1; j < keysx.length + 1; j++) {
document.getElementById('header' + j).innerHTML = '';
}
keytracker = 0;
}
if (colortracker >= colorsx.length) {
colortracker = 0;
}
if (counter >= keysx.length + 1) {
counter = 1;
}
document.getElementById('header' + counter).innerHTML = objx[keysx[keytracker]];
document.getElementById('header' + counter).style.color = colorsx[colortracker];
setTimeout('writeprop()', INTERVALX);
keytracker++;
colortracker++;
counter++;
}

doall = function() {
addheaders();
writeprop();
}

// -->
</SCRIPT>
</HEAD>
<BODY ID = "bodyx" ONLOAD = "setTimeout('doall()', INTERVALX);">
<H1 ID = "zen" STYLE = "text-align:center;font-family:sans-serif">
THE ZEN OF PYTHON
</H1>
</BODY>
</HTML>

Friday, October 1, 2010

JavaScript attempt - the Zen of Python

I just completed a JavaScript course and couldn't resist messing with a web page (html file).  This rotates through the first part of the Zen of Python at five second intervals (warning - newbish code):

<HTML>
<DOCUMENT>
<HEAD>
<TITLE>
The Zen of Python, by Tim Peters
</TITLE>
<SCRIPT LANGUAGE="JavaScript">
<!--
var STYLEX = "border-width:0px;";
STYLEX += "border-style:solid;";
STYLEX += "font-family:sans-serif;";
STYLEX += "color:blue";

// function to put keys into Array
getkeysx = function(hashx) {
    var keysy = [];
    for (keyz in hashx) {
        keysy.push(keyz);
    }
    return keysy;
}

addbr = function() {
    var brx = document.createElement("BR");
    document.formx.appendChild(brx);
}

addx = function(idx) {
    addbr();
    var textx = document.createElement("INPUT");
    textx.type = ("TEXT");
    textx.value = "";
    textx.id = idx;
    textx.size = 50;
    textx.readonly = 'readonly';
    textx.style.borderWidth = '0px';
    textx.style.fontFamily = 'sans-serif';
    textx.style.fontSize = '2.75em';
    textx.style.color = 'blue';
    textx.style.textAlign = 'center';
    document.formx.appendChild(textx);
    addbr();
    addbr();
    addbr();
}

var timex = new Date();
var objx = {timetowork:timex,
number1:'Beautiful is better than ugly.',
number2:'Explicit is better than implicit.',
number3:'Simple is better than complex.',
number4:'Complex is better than complicated.',
number5:'Flat is better than nested.',
number6:'Sparse is better than dense.',
number7:'Readability counts.'};

addtextboxes = function() {
    var i = 1;
    for (keyn in objx) {
        addx('text' + i);
        i++;
    }
}

var keysx = getkeysx(objx);
var keytracker = 0;
var colortracker = 0;
colorsx = ['red', 'green', 'blue', 'black', 'indigo',
           'deeppink', 'darkslategray',
           'darkmagenta', 'darkturquoise']

var counter = 1;
// function to write key-value pair to text box
writeprop = function() {
    objx.timetowork = Date();
    if (keytracker >= keysx.length) {
        for (var j = 1; j < keysx.length + 1; j++) {
            document.formx['text' + j].value = '';
        }
        keytracker = 0;
    }
    if (colortracker >= colorsx.length) {
        colortracker = 0;
    }
    if (counter >= keysx.length + 1) {
        counter = 1;
    }
    document.formx['text' + counter].value = objx[keysx[keytracker]];
    document.formx['text' + counter].style.color = colorsx[colortracker];
    setTimeout('writeprop()', 5000);
    keytracker++;
    colortracker++;
    counter++;
}

doall = function() {
    addtextboxes();
    writeprop();
}

// -->
</SCRIPT>
</HEAD>
<BODY ONLOAD = "setTimeout('doall()', 3000);">
<FORM NAME = "formx">
</FORM>
</BODY>
</DOCUMENT>
</HTML>

Monday, August 23, 2010

More RTL Python Editor

Last time I tried to introduce the idea of an RTL Python editor for RTL languages like Arabic, Persian, Urdu, and Hebrew.

The idea is a little further along.  I've gotten some Arabic and Urdu fonts installed.  Also, the editor is capable of handling dictionaries, lists, tuples, and classes (there is code for brace type characters and an Arabic comma).

Here is a screenshot for a function that is all in Arabic except for the Python keywords (apologies to the Arabic speaking readership - the words are probably nonsense, as I just cut and pasted them randomly from the Python wiki's Arabic page):


The editor is written in Java.  I'm working on writing the transfer to and from interpretable Python code in jython.

The approach I've taken is, if not a brute force one, definitely a forced one.  Trying to mix and match left to right with bidirectional and right to left is tricky.  To get around this I treat everything between whitespace and special characters (braces, colons, periods) as individual pieces of text.   I further separate the text from whitespace and special characters through the use of the non-spacing, invisible Unicode character 200e.  That character forces the editor back into left to right mode between words.  Time will tell if this was a viable design decision.

For now it's one more step away from vaporware and towards releasable open source software.

Tuesday, March 9, 2010

Attempt at an RTL Editor for Python

In previous posts, I've written about the possibilities offered by Python 3.1's Unicode identifier capability, as well as the new challenges posed when one tries to display them on screen.

As the final project for my Java course, I set about trying to create an editor that would allow the user to enter text right to left, but save it as valid interpretable (is that a word?) Python 3.x code:


There's still a good deal of work to be done on this, but this is a start.  Hopefully I'll have something more robust next time.

Wednesday, February 24, 2010

Unicode Poster From Pycon 2010 Up

I posted my poster to Slideshare in Open Office Presentation format.  The file is about 13 or 14 megs in size.  The embedded poster was in png format to preserve the shape of the foreign glyphs.

A number of people asked me at the poster session for sources.  Here are the main ones:

1) Wikipedia - perhaps not the ultimate authority on all things, but a good place to research foreign languages and scripts.

2) the O'Reilly book Fonts and Encodings by Haralambous.  If you know little about Unicode and fonts, this is the next best thing to Knuth.

3) the Python 3.1 interpreter and the unicodedata module.  Once you get the basics of Unicode down, the unicodedata module has most of what you'll need.

4) Google searches and language promotion websites - laoconnection is a site that comes to mind.  Most people are proud of their languages and culture and want to share them.

Thanks to everyone who stopped by the poster.  That was fun.

Tuesday, February 23, 2010

FOSS Conference Economics

Just got back from Pycon - great show.

I've had some time to reflect on how to make conference going affordable, and where my money goes.  This year I was partially funded by my employer.  I was quite grateful, as I wasn't expecting anything.

What has concerned me in the past is the amount of money put out on travel and hotels.  If you're inside the US, Pycon(US) will see the biggest chunk of your money going to the hotel.  This is where people (or at least me) say, "Hey, wait a second, all my monetary support for the Open Source Software movement is going to the hotel industry!"  Not so fast - actually, although your money doesn't support FOSS directly, it does keep it from *losing* money.  To secure a hotel/convention facility for more than 1000 people, there has to be a commitment on rooms.  I've seen other devs stay at cheaper hotels for conferences - this is a good approach, if it's done out of necessity.  I generally try to stay at the conference hotel in order to support the continued success of the conference - to make sure the conference doesn't lose money.

The travel argument goes roughly the same way - you can't have a conference if people don't show.  Even though most of your money is going to the airlines (in the case of Pycon(US) for those outside the United States), your attendance is a plus.

Sunday, February 7, 2010

Handling UnicodeEncodeError in the Console (Python 3.1)

I've been working with a lot of different foreign scripts for the past six months or so.  Ideally I like to work in the console where possible.  An error that always comes up is the following:

[carl@pcbsd]/home/carl(139)% python3.1
Python 3.1.1 (r311:74480, Jan 17 2010, 23:15:26)
[GCC 4.2.1 20070719 [FreeBSD]] on freebsd7
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\u0400')
Traceback (most recent call last):
File "", line 1, in
UnicodeEncodeError: 'ascii' codec can't encode character '\u0400' in position 0: ordinal not in range(128)
>>>


After a while this can get pretty annoying.  There's a number of ways to get around the problem.  I don't know much about most of the languages I'm dealing with, so I prefer the Unicode code charts' capitalized ASCII descriptions to glyphs or empty boxes.  Fortunately the unicodedata module has all this information available.

To get the output I wanted I came up with a little script:

# mockprint.py - wrapper around print
# function to handle
# UnicodeEncoding errors


# python 3.1


import unicodedata


ERRORSTR = "'ascii' codec can't encode character "
CHARIDX = 5
POSITIDX = 8
POSITIDX2 = 7


def mockprint(stringx):
    """
    Wrapper for print() function that
    replaces unprintable characters
    with their Unicode names.
    """
    try:
        print(stringx)
    except UnicodeEncodeError as e:
        # main cases:
        # 1) one character can't be printed
        # 2) multiple characters in a row can't be printed
        # 3) unicode character is first or last in string
        # 4) other ascii characters surround the unicode ones
        reasonx = str(e)
        reasonx = reasonx.split(' ')
        idx = reasonx[POSITIDX]
        # more than 1 char in a row can't be printed
        if idx == 'ordinal':
            idx = int(reasonx[POSITIDX2][0])
            if idx != 0:
                print(stringx[:idx])
            print(unicodedata.name(stringx[idx]))
            mockprint(stringx[(idx + 1):])
        # offending character shows up after ascii chars 
        elif len(stringx) > 1:
            charx = int(reasonx[CHARIDX][3:-1], 16)
            charx = chr(charx)
            print(unicodedata.name(charx))
            mockprint(stringx[(int(idx[0]) + 1):])
        # end of the line 
        elif len(stringx) == 1:
            charx = int(reasonx[CHARIDX][3:-1], 16)
            charx = chr(charx)
            print(unicodedata.name(charx))

A quick demo:

>>> import mockprint
>>> mockprint.mockprint('hello\u0401\u0402\u0403\u0404world')
hello
CYRILLIC CAPITAL LETTER IO
CYRILLIC CAPITAL LETTER DJE
CYRILLIC CAPITAL LETTER GJE
CYRILLIC CAPITAL LETTER UKRAINIAN IE
world

And something a bit more challenging:

 

A few foreign words in a number of different languages.

>>> fle = open('/home/carl/pythonblog/foreignbytestest', 'rt', encoding = 'UTF-8')
>>> import mockprint                                                             
>>> for linex in fle.readlines():                                                
...     mockprint.mockprint(linex)                                               
...                                                                              
CJK UNIFIED IDEOGRAPH-65E5                                                       
CJK UNIFIED IDEOGRAPH-672C                                                       
CJK UNIFIED IDEOGRAPH-8A9E                                                       




abcde


ETHIOPIC SYLLABLE GLOTTAL A
ETHIOPIC SYLLABLE MAA     
ETHIOPIC SYLLABLE RE      
ETHIOPIC SYLLABLE NYAA    




ARMENIAN CAPITAL LETTER HO
ARMENIAN SMALL LETTER AYB
ARMENIAN SMALL LETTER YI 
ARMENIAN SMALL LETTER ECH
ARMENIAN SMALL LETTER REH
ARMENIAN SMALL LETTER ECH
ARMENIAN SMALL LETTER NOW




ORIYA LETTER O
ORIYA LETTER DDA
ORIYA SIGN NUKTA
ORIYA VOWEL SIGN I
ORIYA LETTER AA




LAO LETTER PHO TAM
LAO VOWEL SIGN AA
LAO LETTER SO SUNG
LAO VOWEL SIGN AA
LAO LETTER LO LOOT
LAO VOWEL SIGN AA
LAO LETTER WO




CYRILLIC SMALL LETTER ER
CYRILLIC SMALL LETTER U
CYRILLIC SMALL LETTER ES
CYRILLIC SMALL LETTER ES
CYRILLIC SMALL LETTER KA
CYRILLIC SMALL LETTER I
CYRILLIC SMALL LETTER SHORT I


CYRILLIC SMALL LETTER YA
CYRILLIC SMALL LETTER ZE
CYRILLIC SMALL LETTER YERU
CYRILLIC SMALL LETTER KA

Well, if that isn't beautiful, I don't know what is.

Seriously, this is a hack - parsing an error string and working backwards?  I've got to be joking.  Actually, no.  For as much time as I've spent remembering after the fact that I can't print Unicode in the console, this is worth it, even if it's only good for Python 3.1.
  

Tuesday, February 2, 2010

py-openbsd's DoubleAssociation

I briefly covered this structure last time, but didn't do it justice.  The idea of a two-way dictionary structure (keys and values are both keys) intrigued me. I wanted to give it a spin with a real world example.

I've chosen a simple example with a few domain name (common names) and ip addresses:

# dblassoc.py

import openbsd

# some ip addrsses paired with domains
ips = {'google':(0x4a7d1393, 0xd8ef3d68),
       'openbsd':(0x8ef40c2a,),
       'freebsd':(0x45935321,),
       'yahoo':(0xd1bf5d34, 0xd183249e)}

# OK, we can now make the DoubleAssociation
ipsbothways = openbsd.utils.DoubleAssociation(ips)

print "ipsbothways['yahoo'] = " + str(ipsbothways['yahoo'])

# fair enough, but nothing we couldn't get from the dictionary

# try to query on an ip address to get a domain name
print "ipsbothways[(0x8ef40c2a,)] = " + ipsbothways[(0x8ef40c2a,)]

# unlike a normal dictionary, DoubleAssociation gives everything
# back with the keys() method

for keyx in ipsbothways.keys():
    try:
        if 0xd183249e in keyx:
            print "domain is " + ipsbothways[keyx]
            break
    except TypeError:
        print "TypeError:  " + keyx

Python 2.5.4 (r254:67916, Jul  1 2009, 11:37:21)
[GCC 3.3.5 (propolice)] on openbsd4
Type "help", "copyright", "credits" or "license" for more information.
>>> import dblassoc
ipsbothways['yahoo'] = (3518979380L, 3515032734L)
ipsbothways[(0x8ef40c2a,)] = openbsd
TypeError:  google
TypeError:  openbsd
TypeError:  yahoo
domain is yahoo
>>>

The one thing you have to look out for is the treatment of everything in the structure as a key - that's why I had to catch the TypeError.  Everything is a value, too.  The values and keys methods yield the same results.

In real life, if you had 30 or 50 or 1000 ip addresses, this would come in handy.  Likewise for doctor-patient records, etc. (although the grouping of patients has to be unique, so that may not work after all - best to test both "sides" of the structure for exclusivity).