pyright: 2014

Tuesday, November 25, 2014

Polygon Offset Using Vector Math in IronPython

The other day I saw a something retweeted by @leppie (I think) about an experimental hyper-fast vector math driven 3D engine for the dot Net Framework. This led me to investigate whether there is a default implementation of vector math in the dot Net Framework. As it turns out, there is.

This is of interest because (I think) this would make IronPython the only Python implementation that has vector math included without having to install a third party library. Java has a utils.Vector object, but it has nothing to do with vector math (it's a specialized array). You do need to use the dot Net Framework instead of standard Python modules, but if you're running IronPython, you should have access to that anyway.

The whole, or at least a big part of the idea of running a Python implementation against the dotNet Framework is that you can leverage the power of that big library collection with a language that's fairly dense, easy, and doesn't require compilation.

This was pretty easy on Windows. The only confusing part is that there are two namespaces in dot Net called System.Windows. You want the one that references the WindowsBase dll. This is the one that has our Vector object in it.

The code (including the plotting by Gnuplot - I had to download the Windows version; I did leave out the monastery.py file with the original shape points in it; also, the writetofile.py file is almost exactly like the one from the previous post except that for a Vector object, the x and y names are capitalized):

# vecipy.py

"""
Polygon offset problem using
dot Net Framework.
"""

import clr

WINX = 'WindowsBase'

clr.AddReference(WINX)

from System.Windows import Vector

import math
import copy

import monastery as pic

OFFSET = 0.15

def scaleadd(origin, offset, vectorx):
 """
 From a Vector representing the origin,
 a scalar offset, and a Vector, returns
 a Vector object representing a point
 offset from the origin.

 (Multiply vectorx by offset and add to origin.)
 """
 # Multiply method that takes scalar and Vector.
 multx = Vector.Multiply(vectorx, offset)
 return Vector.Add(multx, origin)

def getinsetpoint(pt1, pt2, pt3):
 """
 Given three points that form a corner (pt1, pt2, pt3),
 returns a point offset distance OFFSET to the right
 of the path formed by pt1-pt2-pt3.

 pt1, pt2, and pt3 are two tuples.

 Returns a Vector object.
 """
 origin = Vector(*pt2)
 v1 = Vector(pt1[0] - pt2[0], pt1[1] - pt2[1])
 v1.Normalize()

 v2 = Vector(pt3[0] - pt2[0], pt3[1] - pt2[1])
 v2.Normalize()

 v3 = copy.copy(v1)

 v1 = Vector.CrossProduct(v1, v2)

 v3 = Vector.Add(v3, v2)
 v3.Normalize()

 # In dotNet - Vector.Multiply is overloaded.
 # When it gets two Vector objects as arguments
 # it returns a dot product.
 cs = Vector.Multiply(v3, v2)

 # Again multiplication is overloaded.
 # Here it gets a scalar and a Vector
 # as arguments.
 a1 = Vector.Multiply(cs, v2)
 a2 = Vector.Subtraction(v3, a1)

 if cs > 0:
 alpha = math.sqrt(a2.LengthSquared)
 else:
 alpha =- math.sqrt(a2.LengthSquared)

 if v1 < 0.0:
 return scaleadd(origin, -1.0 * OFFSET/alpha, v3)
 else:
 return scaleadd(origin, OFFSET/alpha, v3)

def generatepoints():
 """
 Create list of offset points
 for points inset from polygon.

 Return list.
 """
 polyinset = []
 lenpolygon = len(pic.MONASTERY)
 i = 0
 poly = pic.MONASTERY
 while i < lenpolygon - 2:
 polyinset.append(getinsetpoint(poly[i],
 poly[i + 1], poly[i + 2]))
 i += 1
 polyinset.append(getinsetpoint(poly[-2],
 poly[0], poly[1]))
 polyinset.append(getinsetpoint(poly[0],
 poly[1], poly[2]))

 return polyinset

# writetofile.py

"""
Write vector points to file.

Show in gnuplot.
"""

import vecipy as vecx
import os

# We're using gnuplot.
# It doesn't like commas, so
# we'll use whitespace (6).
FMT = '{0:30.28f} {1:30.28f}'
FILEX = 'points'
ORIGSHAPE = 'originalshape'

PLOTCMD = 'set xrange[0.0:6.0]\n'
PLOTCMD += 'set yrange[0.0:6.0]\n'
PLOTCMD += 'plot "{0:s}" with lines lt rgb "red" lw 4, '
PLOTCMD += '"{1:s}" with lines lt rgb "blue" lw 4'
GNUPLOTFILE = 'plotfile'
GNUPLOT = 'gnuplot -p {:s}'.format(GNUPLOTFILE)

pts = vecx.generatepoints()
f = open(FILEX, 'w')
i = 1
for ptx in pts:
 print('Printing point {0:d} . . .'.format(i))
 print >> f, FMT.format(ptx.X, ptx.Y)
 i += 1
f.close()

# Plot original as well.
i = 0
f = open(ORIGSHAPE, 'w')
for ptx in vecx.pic.MONASTERY:
 print('Printing point {0:d} of original shape . . .'.format(i))
 print >> f, FMT.format(*ptx)
 i += 1
f.close()

f = open(GNUPLOTFILE, 'w')
print >> f, PLOTCMD.format(ORIGSHAPE, FILEX)
f.close()
os.system(GNUPLOT)

The result (shown in previous post):

I run OpenBSD on my laptop at home. So I would be using mono in my cross-platform experiment.

Microsoft just recently (Fall 2014) announced the open sourcing of the dotNet Framework and cross platform capability for it. The mono project responded very positively to this announcement. I would imagine this as being good news for IronPython too.

OpenBSD has a package for mono. From there, I just needed to download the IronPython binaries and run mono against them, or so I thought . . .

As it turns out, my script kept crashing on the overloaded Vector.Multiply method - NotImplementedError. I tried to research things, wasn't having any luck, and brute forced the problem by wrapping the method in a class in C# class I called vecx:

Note (26NOV2014): I hacked this C# module up a bit too quickly and didn't have performance or elegance in mind. If you declare those Multiply methods as static you can save yourself the trouble of instantiating a new instance of the class each time you want to call them. In fact, you can do the same thing with all the Vector methods you want to use (Add, CrossProduct, etc.). I was just too hurried and too lazy. CBT

using System;

public class vecx
{

public System.Windows.Vector vectorx;

public vecx()
{
 System.Windows.Vector vectorx = new System.Windows.Vector(0.0, 0.0);
 this.vectorx = vectorx;
}

public vecx(double x, double y)
{
 System.Windows.Vector vectorx = new System.Windows.Vector(x, y);
 this.vectorx = vectorx;
}

public Double Multiply(System.Windows.Vector a, System.Windows.Vector b)
{
 return System.Windows.Vector.Multiply(a, b);
}

public System.Windows.Vector Multiply(Double a, System.Windows.Vector b)
{
 return System.Windows.Vector.Multiply(a, b);
}

public System.Windows.Vector Multiply(System.Windows.Vector a, Double b)
{
 return System.Windows.Vector.Multiply(a, b);
}

}

The command line (your paths will probably be different) text for compiling this under mono was:

$ mcs -r:/usr/local/lib/mono/4.5/WindowsBase.dll -target:library vecx.cs

The code using this faux Vector class was a little bit different (and hackish):

"""
Polygon offset problem using
dot Net Framework.

Modified for use with mono.
"""

import clr

# Hacked C# module.
VECX = '/home/carl/vectormath/IronPython/mono/vecx.dll'

clr.AddReference(VECX)

import vecx

import math
import copy

import monastery as pic

OFFSET = 0.15

def scaleadd(origin, offset, vectorx):
 """
 From a Vector representing the origin,
 a scalar offset, and a Vector, returns
 a Vector object representing a point
 offset from the origin.

 (Multiply vectorx by offset and add to origin.)
 """
 # Generic vector for use of Vector type.
 vecgeneric = vecx().vectorx

 # Multiply method that takes scalar and Vector.
 # Using cs module compiled to dll for Multiply
 # methods in mono.
 multx = vecx().Multiply(vectorx, offset)
 return vecgeneric.Add(multx, origin)

def getinsetpoint(pt1, pt2, pt3):
 """
 Given three points that form a corner (pt1, pt2, pt3),
 returns a point offset distance OFFSET to the right
 of the path formed by pt1-pt2-pt3.

 pt1, pt2, and pt3 are two tuples.

 Returns a Vector object.
 """
 # Generic vector for use of type.
 vecgeneric = vecx().vectorx

 origin = vecx(*pt2).vectorx
 v1 = vecx(pt1[0] - pt2[0], pt1[1] - pt2[1]).vectorx
 v1.Normalize()

 v2 = vecx(pt3[0] - pt2[0], pt3[1] - pt2[1]).vectorx
 v2.Normalize()

 v3 = copy.copy(v1)

 v1 = vecgeneric.CrossProduct(v1, v2)

 v3 = vecgeneric.Add(v3, v2)
 v3.Normalize()

 # In dotNet - Vector.Multiply is overloaded.
 # When it gets two Vector objects as arguments
 # it returns a dot product.
 # Using cs module compiled to dll for Multiply
 # methods in mono.
 cs = vecx().Multiply(v3, v2)

 # Again multiplication is overloaded.
 # Here it gets a scalar and a Vector
 # as arguments.
 # Using cs module compiled to dll for Multiply
 # methods in mono.
 a1 = vecx().Multiply(cs, v2)
 a2 = vecgeneric.Subtract(v3, a1)

 if cs > 0:
 alpha = math.sqrt(a2.LengthSquared)
 else:
 alpha =- math.sqrt(a2.LengthSquared)

 if v1 < 0.0:
 return scaleadd(origin, -1.0 * OFFSET/alpha, v3)
 else:
 return scaleadd(origin, OFFSET/alpha, v3)

def generatepoints():
 """
 Create list of offset points
 for points inset from polygon.

 Return list.
 """
 polyinset = []
 lenpolygon = len(pic.MONASTERY)
 i = 0
 poly = pic.MONASTERY
 while i < lenpolygon - 2:
 polyinset.append(getinsetpoint(poly[i],
 poly[i + 1], poly[i + 2]))
 i += 1
 polyinset.append(getinsetpoint(poly[-2],
 poly[0], poly[1]))
 polyinset.append(getinsetpoint(poly[0],
 poly[1], poly[2]))

 return polyinset

Any port in a storm or whatever it takes, as they say.

Thanks again to Mr. Rafsanjani whom I referenced in my previous post. His methodology and detection of a former bug got me back on track.

And thank you for stopping by.

Sunday, November 16, 2014

Polygon Offset With pyeuclid Revisited

A few years back I did two or three posts on polygon offset. It was a learning experience that I never quite completed to my satisfaction. A kind visitor to my last post on the subject, Mr. Ahmad Rafsanjani, actually rewrote some of my code in a comment. I gave him a polite weasel answer thanking him, but dropped the effort and never felt quite right about it.

Well, as the saying goes, better late than never. He was quite correct in his assessment, but my understanding of vector math was not strong enough to prove this to myself. I was visually inspecting the results, and, given what I was dealing with at the time, they seemed OK.

Here is the picture we're trying to get (this is with Mr. Rafsanjani's code, but the difference with mine and the original code, although wrong, is not great):

In order to nail down the discrepancy in my original code, I inserted some print statements with a lot of numeric precision (28 digits to the right of the decimal) in the output:

$ more points
1.2231671842700024832595318003 1.7024195134850139687898717966
2.1231671842700023944416898303 1.7024195134850139687898717966
2.2768328157299975167404681997 2.5475804865149860312101282034
1.6635803619063778135966913396 2.5493839809701555054743948858
1.7364196380936223196300716154 3.3506160190298444057077631442
2.5205825797292722434406186949 3.3529128986463621053815131745
2.6794174202707274901058553951 4.1470871013536383387076966756
2.1360193516544989655869812850 4.1228847880778562995374159073

(etc.)

The numbers highlighted in yellow are mismatches in the Y-coordinates of points of the inset offset polygon - each pair of Y coordinates should represent lines parallel to the X axis; in other words, they should be equal. I have a bug.

Contrast that with the numbers yielded by Mr. Rafsanjani's code:

$ more points
1.2251864530113494300422871675 1.7000000000000001776356839400
2.1251864530113491191798402724 1.7000000000000001776356839400
2.2797319075568038826418160170 2.5499999999999998223643160600
1.6642549229616445671808833140 2.5499999999999998223643160600
1.7369821956889173186766583967 3.3500000000000000888178419700
2.5229705854077835169846366625 3.3500000000000000888178419700
2.6829705854077836590931838145 4.1500000000000003552713678801
2.1880983342360056376207921858 4.1500000000000003552713678801
2.6780983342360054066944030637 4.8499999999999996447286321199
3.1219016657639944156699129962 4.8499999999999996447286321199
(etc.)

Much better. Lines that are supposed to be perfectly parallel to the X axis are, at least to 28 decimal places precision and the limits of my platform and the C Python interpreter, parallel to the X axis. For what I am doing, I can more than live with that.

I've included Mr. Rafsanjani's comments in the code. My modifications to his code were mainly for the purpose of printing some things out and organizing the polygon offset part of this exercise into a module.

I've made a separate main script for gnuplot. After not looking at everything for three years I realized I had forgotten everything I ever knew about gnuplot and wanted to record it this time. The file with the 20 points for the shape (monastery.py) is available on request.

Here is the main pyeuclid/polygon offset part of the code (rafsanjanicorrection.py):

"""
Polygon offset problem using
pyeuclid and incorporating corrections
made by Ahmed Rafsanjani.
"""

# Mr. Rafsanjani's comments:

# I think there is a small bug:

# In "getinsetpoint", the vector v3 should be
# normalized before passing to "scaleadd".

# Furthermore, the final offset is not as the
# prescribed OFFSET and the angle between
# vectors should be taken into account.

# A possible solution could be:

import euclid as eu
import math
import copy

import monastery as pic

OFFSET = 0.15

def scaleadd(origin, offset, vectorx):
 """
 From a vector representing the origin,
 a scalar offset, and a vector, returns
 a Vector3 object representing a point
 offset from the origin.

 (Multiply vectorx by offset and add to origin.)
 """
 multx = vectorx * offset
 return multx + origin

def getinsetpoint(pt1, pt2, pt3):
 """
 Given three points that form a corner (pt1, pt2, pt3),
 returns a point offset distance OFFSET to the right
 of the path formed by pt1-pt2-pt3.

 pt1, pt2, and pt3 are two tuples.

 Returns a Vector3 object.
 """
 origin = eu.Vector3(pt2[0], pt2[1], 0.0)
 v1 = eu.Vector3(pt1[0] - pt2[0], pt1[1] - pt2[1], 0.0)
 v1.normalize()

 v2 = eu.Vector3(pt3[0] - pt2[0], pt3[1] - pt2[1], 0.0)
 v2.normalize()

 v3 = copy.copy(v1)
 v1 = v1.cross(v2)
 v3 += v2
 v3.normalize()

 cs = v3.dot(v2)

 a1 = cs * v2
 a2 = v3 - a1

 if cs > 0:
 alpha = math.sqrt(a2.magnitude_squared())
 else:
 alpha =- math.sqrt(a2.magnitude_squared())

 if v1.z < 0.0:
 return scaleadd(origin, -1.0 * OFFSET/alpha, v3)
 else:
 return scaleadd(origin, OFFSET/alpha, v3)

def generatepoints():
 """
 Create list of offset points
 (pyeuclid.Vector3 objects) for
 points inset from polygon.

 Return list.
 """
 polyinset = []
 lenpolygon = len(pic.MONASTERY)
 i = 0
 poly = pic.MONASTERY
 while i < lenpolygon - 2:
 polyinset.append(getinsetpoint(poly[i],
 poly[i + 1], poly[i + 2]))
 i += 1
 polyinset.append(getinsetpoint(poly[-2],
 poly[0], poly[1]))
 polyinset.append(getinsetpoint(poly[0],
 poly[1], poly[2]))

 return polyinset

The file that prints stuff out and summons gnuplot (writtofile.py):

"""
Write vector points to file.

Show in gnuplot.
"""

# import blogpost as vecx
import rafsanjanicorrection as vecx
import os

# We're using gnuplot.
# It doesn't like commas, so
# we'll use whitespace (6).
FMT = '{0:30.28f} {1:30.28f}'
FILEX = 'points'
ORIGSHAPE = 'originalshape'

PLOTCMD = 'set xrange[0.0:6.0]\n'
PLOTCMD += 'set yrange[0.0:6.0]\n'
PLOTCMD += 'plot "{0:s}" with lines lt rgb "red" lw 4, '
PLOTCMD += '"{1:s}" with lines lt rgb "blue" lw 4'
GNUPLOTFILE = 'plotfile'
GNUPLOT = 'gnuplot -p {:s}'.format(GNUPLOTFILE)

pts = vecx.generatepoints()
f = open(FILEX, 'w')
i = 1
for ptx in pts:
 print('Printing point {0:d} . . .'.format(i))
 print >> f, FMT.format(ptx.x, ptx.y)
 i += 1
f.close()
# Plot original as well.
# XXX - repetetive - make function.
i = 0
f = open(ORIGSHAPE, 'w')
for ptx in vecx.pic.MONASTERY:
 print('Printing point {0:d} of original shape . . .'.format(i))
 print >> f, FMT.format(ptx[0], ptx[1])
 i += 1
f.close()

f = open(GNUPLOTFILE, 'w')
print >> f, PLOTCMD.format(ORIGSHAPE, FILEX)
f.close()
os.system(GNUPLOT)

pyeuclid, to the best of my knowledge, runs only in Python 2.7 at the moment. In any case, I got an error on the Python 3.4 install with setup.py so I stuck with 2.7.

Thanks to Mr. Rafsanjani for his help with this and for the rest of you for stopping by.

Monday, November 3, 2014

MeetBSD California 2014 Recap

I am returning from MeetBSD in San Jose, California. This isn't a Python-related post per se, but the BSD family of operating systems maintains packages and ports for Python and Python third party libraries, and use of Python on these systems is significant both in the open source development and commercial spheres.

The structure of the conference is a brief weekend unconference. Nonetheless some of the talks were more than worthy of a full fledged mega-con, and the rest were quality. It was a good deal.

Venue: the conference was held at Western Digital. WD sells a variety of hardware. The product they were pushing was a several terabyte little box that updates wirelessly (but not by Bluetooth).

We met in a rectangular conference room. All of Silicon Valley seems to me to be an endless office park with nice weather and some landscaped spots (I've included the obligatory Strelizia/bird of paradise pic from the conference hotel entrance below). It was a fairly intimate setting. The food (a variety of sandwiches) was good. We were warned ahead of time that Wifi was limited; I brought my own Verizon jetpack unit so it wasn't an issue for me.

Talks (that I attended):

1) Rick Reed, “WhatsApp: Half a billion unsuspecting FreeBSD users” - Erlang and FreeBSD at WhatsApp used for scaling. Now 600,000 users. It was a good talk, but I wasn't awake and some of it went over my head.

2) Jordan Hubbard, “FreeBSD: The Next 10 Years” Good talk; I hated it :-(

Hubbard's leaving Apple a couple years ago and signing on with iXSystems (a sponsor and essentially the organizer of this conference) made a big splash. He is an accomplished dev and a good guy by all accounts. His ideas are on many levels very valid in every sense.

I am primarily an OpenBSD user. I run FreeBSD on my RPi and on a spare laptop for easy access to Java. The two OS's have similar philosophies in some respects (correctness, BSD license, etc.). There is cross-polination when it comes to operating system components, apps, and drivers. But where OpenBSD unapologetically maintains new releases for older hardware and uncompromisingly adheres to its leader's approach to security and development, FreeBSD in the framework of Hubbard's talk is looking more towards the future and making changes to attract younger talented core committers and target more modern (read mobile) platforms. Telemetry, scrapping development on older platforms "ruthlessly," getting younger devs involved by providing work that's interesting to them - all this stuff is important for FreeBSD going forward. At one point he even <gasp> suggested systemd as a good strategy for Linux that FreeBSD should, at least in principle if not in form, emulate.

FreeBSD is everywhere - or at least in a lot of places companies just don't make a big deal of. Inside cable (connections) was the one example. In order to accomodate mobile and embedded environments, the OS, although well suited to these platforms now, needs to change.

A lot of this in my mind goes against OpenBSD's philosophy - purity and security at all costs. My personal philosophy lies with the OpenBSD approach, but I may well be wrong. Hubbard is a guy with a lot of industry know how and experience and I am a geologist who uses OpenBSD. He is probably right, but I don't want my fun to stop, so I'm sticking with OpenBSD even if death awaits us . . .

3) David Maxwell, "The Unix command pipeline - using Unix in the renewable energy era"

I always liked Maxwell. He's a Canadian guy and a NetBSD devotee.

His talk was about a command line app he's putting together for better tracking piped commands on the UNIX command line and reproducing, referencing, and inspecting them retroactively in a way that's easier than what you have to do now. I think it's got potential and would like to see it succeed.

After the angst I felt over Hubbard's talk, this was a welcome relief. The UNIX command line is something everyone, or most everyone at the con knows and loves. Everyone uses piped commands. This is a useful approach to a common problem - that's something we can all agree on. My favorite talk of the conference (that I attended).

4) Alex Rosenberg, "Meet PlayStation 4"

By far and away the coolest talk. Rosenberg presented this well and spoke honestly and as openly as he could as a member of a big commercial project about specifics. Games require so much optimization at such a low level. Although this theme came up in a number of the talks, on the PlayStation project it's critical. Essentially, the best hardware and hardware architecture for the project is selected for a given product lifecycle (10 years? IIRC) then you hammer at it with software modifications to get every last bit of efficiency out of it.

It's not like there's a standard laptop install of FreeBSD on PlayStation 4 and you let it rip with your happy traditional UNIX OS. They're optimizing LLVM and clang (the compiler and linkers), talking directly to the metal as much as possible, and just generally nailing performance at the lowest level of the architecture (after they've gotten the low hanging fruit up top, of course).

Another theme that came up in almost all the talks, but especially in this one, was the BSD license. Granted, it was a BSD conference, so organizers and attendees have a bias. Nonetheless, it appears that licensing is really critical in the decision to adopt open source software and operating systems. "business friendly" nowadays often has "capitalism at its worst" overtones, still, it was a theme: the BSD license is the "business friendly" one whereas the GPL, particularly the GPL3, is not . . .

I'm not a gamer, but I enjoyed this. Rosenberg is really easy to talk to as well. He let me take that pic up close when we were posing for the group pic after his talk.

5) Brendan Gregg, "Performance Analysis"

Gregg works for Netflix. He's written a lot of dtrace scripts (including numerous Python ones) and has them readily available on Github.

I found myself wishing I knew more about the subject, because performance monitoring is a really cool netadmin problem when, like Netflix, you're dealing with huge bandwidth challenges (as in other talks, so much comes down to optimization).

That said, Gregg presented some graphical tools that are useful (I'll get the names wrong, so I won't try) - basically histogram-like, color coded performance charts with labels for processes. You don't have to run your own netflix to benefit from these and he's made everything open source and available. If I were a netadmin I would jump on this. I've got to get smarter first before I can benefit from these tools.

Gregg has a soft British accent and a very amiable demeanor. He was the first talk in the morning. It was like a lullabye. This is one I need to revisit on the videos posted online because it's worth it.

6) Corey Vixie, "Web Apps on Embedded BSD..."

The iXSystems surprise talk, but a good one. The youngster Vixie briefed us a bit on what iXSystems is doing with web presentation layer (for lack of a better description) of the FreeNAS implementation.

He started off by saying static web pages are, at least for apps like FreeNAS, not the way to go anymore. Refreshing the DOM (Document Object Model) at regular intervals is not going to work well. He then introduced us to a number of mature and nascent JavaScript/web technologies, some of which no one in the room had yet heard of. Basically he had to rewrite the "old" Django/other technologies implementation to accomodate better simulation of a desktop app in the browser.

The specifics were not something I could follow well because of my ignorance. There was talk of an Open Source, BSD licensed Facebook framework whose name I can't recall, a one-way change propagation architecture for updating the dynamic web page, and, as always, optimization of the process. I asked him about Django after the talk. He said it was the best thing a couple years ago for this app, but now they needed something that could interact directly with the browser - namely JavaScript - it comes down to fine-grained control and optimization.

One humorous interlude during the Q & A was my asking him if he was indeed related to Paul Vixie, historical UNIX tools author (Vixie Cron), to which he replied, "This is the part of my talk where I say, 'I am Worf, son of Mogh.'" Anyone with a sense of humor and a knowledge of STTNG can't be all bad ;-)

A few people pics:

Dru Lavigne. Without the BSDA cert program she helped found, I would never have gotten over the hump learning UNIX. We differ on our choice of specific BSD, but I still consider her my UNIX mentor.

iXSystems old timers Denise and Matt working out conference specifics.

FreeBSD Foundation rep Anne.

Conclusion: MeetBSD is an affordable, pretty meaty con if you like UNIX, hardware, and topics about optimization and scale. It is, fortunately or unfortunately, a pretty well kept secret.

Thanks for stopping by.

Friday, October 31, 2014

Gtk.TreeView (grid view) with mono, gtk-sharp, and IronPython

The post immediately prior to this one was an attempt to reproduce Windows.Forms Calendar controls in Gtk for cross platform (Windows/*nix) effective rendering.

This time I am attempting to get familiar with gtk-sharp/Gtk's version of a grid view - the Gtk.TreeView object. Some of the gtk-sharp documentation suggests the NodeView object would be easier to use. I had some trouble instantiating the objects associated with the NodeView and went with the TreeView instead in the hopes of getting more control.

The Windows.Forms GridView I did years ago is here. It became apparent to me shortly after embarking on this journey that I would be hard pressed to recreate all the functionality of that script in a timely manner. I settled for a tabular view of drillhole data (fabricated, mock data) with some custom formatting.

Aside: this is typically how mineral exploration drillhole data (core, reverse circulation drilling) is presented in tabular format - a series of from-to intervals with assay values. Assuming the assays are all separate elements, the reported weight percents should not sum more than 100%, and never do unless someone fat fingers a decimal place. I've projected a couple screaming hot polymetallic drill holes that end near surface (lack of funding for drilling), but show enough promise that the new mining town of Trachteville (the drill hole name CBT-BNZA stands for CBT-Bonanza) will spring up there at any moment . . . one can dream.

The data store object for the grid view Gtk.ListStore object would not instantiate in IronPython. I was not the only person to have experienced this problem (I cannot locate the link to the mailing list thread or forum reference, but like the big fish that got away, I swear I saw it). I didn't want to drop the effort just because of that, so I hacked and compiled some C# code:

public class storex
{
    public Gtk.ListStore drillhole =
                            // 7 columns
                            // drillhole id
          new Gtk.ListStore (typeof (string),
                            // from
                            typeof (double),
                            // to
                            typeof (double),
                            // assay1
                            typeof (double),
                            // assay2
                            typeof (double),
                            // assay3
                            typeof (double),
                            // assay4
                            typeof (double));
}

The mono command on Windows was

C:\UserPrograms\Mono-3.2.3>mcs -pkg:gtk-sharp-2.0 /target:library C:\UserPrograms\IronPythonGUI\storex.cs

Those are my file paths; locations depend on where you install things like mono and IronPython.

Anyway, I got my dll and I was off to the races. Getting to know the Gtk and gtk-sharp object model proved challenging for me. I'm glad I got some familiarity with it, but it would take me longer to do something in Gtk than it did with Windows.Forms. The most fun and gratifying part of the project was getting the custom formatting to work with a Gtk.TreeCellDataFunc. I used a function that yielded specific functions for each column - something that's really easy to do in Python.

Anyway, here are a couple screenshots and the IronPython code:

The OpenBSD one below turned out pretty good, but the Windows one had a little double line underneath the first row - it looked as though it was still trying to select that row when I told it specifically not to. I'm not a design perfectionist Steve Jobs type, but niggling nits like that drive me batty. For now, though it's best I publish the code and move on.

#!/usr/local/bin/mono /home/carl/IronPython-2.7.4/ipy64.exe

import clr

GTKSHARP = 'gtk-sharp'
PANGO = 'pango-sharp'

# Mock store C#
STOREX = 'storex'

clr.AddReference(GTKSHARP)
clr.AddReference(PANGO)

# C# module compiled for this project.
# Problems with Gtk.ListStore in IronPython.
clr.AddReference(STOREX)

import Gtk
import Pango

import storex

TITLE = 'Gtk.TreeView Demo (Drillholes)'
MARKUP = '{:s}'
MARKEDUPTITLE = MARKUP.format(TITLE)

CENTERED = 0.5
RIGHT = 1.0

WINDOWWIDTH = 350

COURFONTREGULAR = 'Courier New 12'
COURFONTBOLD = 'Courier New Bold 12'

DHNAME = 'DH_CBTBNZA-{:>02d}'
DHNAMELABEL = 'drillhole'
FROM = 'from'
TO = 'to'
ASSAY1 = 'assay1'
ASSAY2 = 'assay2'
ASSAY3 = 'assay3'
ASSAY4 = 'assay4'

FP1FMT = '{:>5.1f}'
FP2FMT = '{:>4.2f}'

DHDATAX = {(DHNAME.format(1), 0.0):{TO:8.7,
 ASSAY1:22.27,
 ASSAY2:4.93,
 ASSAY3:18.75,
 ASSAY4:35.18},
 (DHNAME.format(1), 8.7):{TO:15.3,
 ASSAY1:0.27,
 ASSAY2:0.09,
 ASSAY3:0.03,
 ASSAY4:0.22},
 (DHNAME.format(1), 15.3):{TO:25.3,
 ASSAY1:2.56,
 ASSAY2:11.34,
 ASSAY3:0.19,
 ASSAY4:13.46},
 (DHNAME.format(2), 0.0):{TO:10.0,
 ASSAY1:0.07,
 ASSAY2:1.23,
 ASSAY3:4.78,
 ASSAY4:5.13},
 (DHNAME.format(2), 10.0):{TO:20.0,
 ASSAY1:44.88,
 ASSAY2:12.97,
 ASSAY3:0.19,
 ASSAY4:0.03}}

FIELDS = [DHNAMELABEL, FROM, TO, ASSAY1, ASSAY2, ASSAY3, ASSAY4]
BOLDEDCOLUMNS = [DHNAMELABEL, FROM, TO]
NONKEYFIELDS = FIELDS[2:]

BLAZINGCUTOFF = 10.0

def genericfloatformat(floatfmt, index):
 """
 For cell formatting in Gtk.TreeView.

 Returns a function to format floats
 and to format floats' foreground color
 based on cutoff value.

 floatfmt is a format string.

 index is an int that indicates the
 column being formatted.
 """
 def setfloatfmt(treeviewcolumn, cellrenderer, treemodel, treeiter):
 cellrenderer.Text = floatfmt.format(treemodel.GetValue(treeiter, index))
 # If it is one of the assay value columns.
 # XXX - not generic.
 if index > 2:
 if treemodel.GetValue(treeiter, index) > BLAZINGCUTOFF:
 cellrenderer.Foreground = 'red'
 else:
 cellrenderer.Foreground = 'black'
 return Gtk.TreeCellDataFunc(setfloatfmt)

class TreeViewTest(object):
 def __init__(self):
 Gtk.Application.Init()
 self.window = Gtk.Window('')
 # DeleteEvent - copied from Gtk demo on internet.
 self.window.DeleteEvent += self.DeleteEvent
 # Frame property provides a frame and title.
 self.frame = Gtk.Frame(MARKEDUPTITLE)
 self.tree = Gtk.TreeView()
 self.tree.EnableGridLines = Gtk.TreeViewGridLines.Both
 self.frame.Add(self.tree)

 # Fonts for formatting.
 self.fdregular = Pango.FontDescription.FromString(COURFONTREGULAR)
 self.fdbold = Pango.FontDescription.FromString(COURFONTBOLD)

 # C# module
 self.store = storex().drillhole

 self.makecolumns()
 self.adddata()
 self.tree.Model = self.store

 self.formatcolumns()
 self.formatcells()
 self.prettyup()

 self.window.Add(self.frame)
 self.window.ShowAll()
 # Keep text viewable - size no smaller than intended.
 self.window.AllowShrink = False
 # XXX - hack to keep lack of gridlines on edges of
 # table from showing.
 self.window.AllowGrow = False
 # Unselect everything for this demo.
 self.tree.Selection.UnselectAll()
 Gtk.Application.Run()

 def makecolumns(self):
 """
 Fill in columns for TreeView.
 """
 self.columns = {}
 for fieldx in FIELDS:
 self.columns[fieldx] = Gtk.TreeViewColumn()
 self.columns[fieldx].Title = fieldx
 self.tree.AppendColumn(self.columns[fieldx])

 def formatcolumns(self):
 """
 Make custom labels for columnn headers.

 Get each column properly justified (all
 are right justified,floating point numbers
 except for the drillhole 'number' -
 actually a string).
 """
 self.customlabels = {}

 for fieldx in FIELDS:
 # This centers the labels at the top.
 self.columns[fieldx].Alignment = CENTERED
 self.customlabels[fieldx] = Gtk.Label(self.columns[fieldx].Title)
 self.customlabels[fieldx].ModifyFont(self.fdbold)
 # 120 is about right for from, to, and assay columns.
 self.columns[fieldx].MinWidth = 120
 self.customlabels[fieldx].ShowAll()
 self.columns[fieldx].Widget = self.customlabels[fieldx]
 # ShowAll required for new label to take.
 self.columns[fieldx].Widget.ShowAll()

 def formatcells(self):
 """
 Add and format cell renderers.
 """
 self.cellrenderers = {}

 for fieldx in FIELDS:
 self.cellrenderers[fieldx] = Gtk.CellRendererText()
 self.columns[fieldx].PackStart(self.cellrenderers[fieldx], True)
 # Drillhole 'number' (string)
 if fieldx == FIELDS[0]:
 self.cellrenderers[fieldx].Xalign = CENTERED
 self.columns[fieldx].AddAttribute(self.cellrenderers[fieldx],
 'text', 0)
 else:
 self.cellrenderers[fieldx].Xalign = RIGHT
 try:
 self.columns[fieldx].AddAttribute(self.cellrenderers[fieldx],
 'text', FIELDS.index(fieldx))
 except ValueError:
 print('\n\nProblem with field definitions; field not found.\n\n')
 for fieldx in BOLDEDCOLUMNS:
 self.cellrenderers[fieldx].Font = COURFONTBOLD
 self.columns[fieldx].Widget.ShowAll()

 # XXX - not very generic, but better than doing them one by one.
 # from, to columns.
 for x in xrange(1, 3):
 self.columns[FIELDS[x]].SetCellDataFunc(self.cellrenderers[FIELDS[x]],
 genericfloatformat(FP1FMT, x))
 # assay<x> columns.
 for x in xrange(3, 7):
 self.columns[FIELDS[x]].SetCellDataFunc(self.cellrenderers[FIELDS[x]],
 genericfloatformat(FP2FMT, x))

 def usemarkup(self):
 """
 Refreshes UseMarkup property on widgets (labels)
 so that they display properly and without
 markup text.
 """
 # Have to refresh this property each time.
 self.frame.LabelWidget.UseMarkup = True

 def prettyup(self):
 """
 Get Gtk objects looking the way we
 intended.
 """
 # Try to get Courier New on treeview.
 self.tree.ModifyFont(self.fdregular)
 # Get rid of line.
 self.frame.Shadow = Gtk.ShadowType.None
 self.usemarkup()

 def adddata(self):
 """
 Put data into store.
 """
 # XXX - difficulty figuring out sorting
 # function for TreeView. Hack it
 # with dictionary here.
 keytuples = [key for key in DHDATAX]
 keytuples.sort()
 datax = []
 for tuplex in keytuples:
 # XXX - side effect comprehension.
 # Not great for readability,
 # but compact.
 [datax.append(x) for x in tuplex]
 for fieldx in NONKEYFIELDS:
 datax.append(DHDATAX[tuplex][fieldx])
 self.store.AppendValues(*datax)
 # Reinitiialize data row list.
 datax = []

 def DeleteEvent(self, widget, event):
 Gtk.Application.Quit()

if __name__ == '__main__':
 TreeViewTest()

Thanks for stopping by.

Thursday, October 30, 2014

Mono gtk-sharp IronPython CalendarView

A number of years ago I did a post on the IronPython Cookbook site about the Windows.Forms Calendar control. I could never get the thing to render nicely on *nix operating systems (BSD family). It sounds as though Windows.Forms development for mono (and in general) is kind of dead, so there is not much hope that solution/example will ever render nicely on *nix. Recently I've been playing with mono and decided to give gtk-sharp a shot with IronPython.

Quick disclaimers:

1) I suspect from the examples I've seen on the internet that PyGtk is a little easier to deal with than gtk-sharp. That's OK; I wanted to use IronPython and have the rest of the mono/dotNet framework available, so I went through the extra trouble to forego CPython and PyGtk and go with IronPython and gtk-sharp instead.

2) The desktop is not the most cutting edge or sexy platform in 2014. Nonetheless, where I work it is alive and well. When I no longer see engineers hacking solutions in Excel and VBA, I'll consider the possibility of outliving the desktop. Right now I'm not hopeful :-\

The results aren't bad, at least as far as rendering goes. I couldn't get the Courier font to take on OpenBSD, but the Gtk Calendar control looks acceptable. All in all, I was OK with the results on both Windows and OpenBSD. I've heard Gtk doesn't do quite as well on Apple products, but I don't own a Mac to test with. Here are a couple screenshots:

I run the cwm window manager on OpenBSD and have it set up to cut out borders on windows, hence the more minimalist look to the control there.

IronPython output on *nix has always come out in yellow or white - it doesn't show up on a white background, which I prefer. In order to get around this, I run an xterm with a black background:

xterm -bg black -fg white

Here is the code for the gtk-sharp Gtk.Calendar control:

#!/usr/local/bin/mono /home/carl/IronPython-2.7.4/ipy64.exe

import clr

GTKSHARP = 'gtk-sharp'
PANGO = 'pango-sharp'

clr.AddReference(GTKSHARP)
clr.AddReference(PANGO)

import Gtk
import Pango

import datetime

TITLE = 'Gtk.Calendar Demo'
MARKUP = '{:s}'
MARKEDUPTITLE = MARKUP.format(TITLE)

INFOMSG = '\n\n Program set to run for:\n\n '
INFOMSG += '{:%Y-%m-%d}\n\n'

DATEDIFFMSG = '\n\n '
DATEDIFFMSG += 'There are {0:d} days between the\n'
DATEDIFFMSG += ' beginning of the epoch and\n'
DATEDIFFMSG += ' {1:%Y-%m-%d}.\n\n'

ALIGNMENTPARAMS = (0.0, 0.5, 0.0, 0.0)

WINDOWWIDTH = 350

CALENDARFONT = 'Courier New Bold 12'

class CalendarTest(object):
 inthebeginning = datetime.datetime.fromtimestamp(0)
 # Debug info - make sure beginning of epoch really
 # is +midnight, Jan 1, 1970 GMT.
 print(inthebeginning)
 def __init__(self):
 Gtk.Application.Init()
 self.window = Gtk.Window(TITLE)
 # DeleteEvent - copied from Gtk demo on internet.
 self.window.DeleteEvent += self.DeleteEvent
 # Frame property provides a frame and title.
 self.frame = Gtk.Frame(MARKEDUPTITLE)
 self.calendar = Gtk.Calendar()
 # Handles date selection event.
 self.calendar.DaySelected += self.dateselect
 # Sets up text for labels.
 self.getcaltext()
 # Puts little box around text.
 self.datelabelframe = Gtk.Frame()
 # Try to get datelabel to align with other label.
 self.datelabelalignment = Gtk.Alignment(*ALIGNMENTPARAMS)
 self.datelabel = Gtk.Label(self.caltext)
 self.datelabelalignment.Add(self.datelabel)
 self.datelabelframe.Add(self.datelabelalignment)
 # Puts little box around text.
 self.datedifflabelframe = Gtk.Frame()
 self.datedifflabelalignment = Gtk.Alignment(*ALIGNMENTPARAMS)
 self.datedifflabel = Gtk.Label(self.timedifftext)
 self.datedifflabelalignment.Add(self.datedifflabel)
 self.datedifflabelframe.Add(self.datedifflabelalignment)
 self.vbox = Gtk.VBox()
 self.vbox.PackStart(self.datelabelframe)
 self.vbox.PackStart(self.datedifflabelframe)
 self.vbox.PackStart(self.calendar)
 self.frame.Add(self.vbox)
 self.window.Add(self.frame)
 self.prettyup()
 self.window.ShowAll()
 # Keep text viewable - size no smaller than intended.
 self.window.AllowShrink = False
 Gtk.Application.Run()

 def getcaltext(self):
 """
 Get messages for run date.
 """
 # Calendar month is 0 based.
 yearmonthday = self.calendar.Year, self.calendar.Month + 1, self.calendar.Day
 chosendate = datetime.datetime(*yearmonthday)
 self.caltext = INFOMSG.format(chosendate)
 # For reporting of number of days since beginning of epoch.
 timediff = chosendate - CalendarTest.inthebeginning
 self.timedifftext = DATEDIFFMSG.format(timediff.days, chosendate)

 def usemarkup(self):
 """
 Refreshes UseMarkup property on widgets (labels)
 so that they display properly and without
 markup text.
 """
 # Have to refresh this property each time.
 self.frame.LabelWidget.UseMarkup = True
 self.datelabel.UseMarkup = True
 self.datedifflabel.UseMarkup = True

 def prettyup(self):
 """
 Get Gtk objects looking the way we
 intended.
 """
 # Try to make frame wider.
 # XXX
 # Works nicely on Windows - try on Unix.
 # Allows bold, etc.
 self.usemarkup()
 self.frame.SetSizeRequest(WINDOWWIDTH, -1)
 # Get rid of line in middle of text on title.
 self.frame.Shadow = Gtk.ShadowType.None
 # Try to get Courier New on calendar.
 fd = Pango.FontDescription.FromString(CALENDARFONT)
 self.calendar.ModifyFont(fd)
 self.datelabel.Justify = Gtk.Justification.Left
 self.datedifflabel.Justify = Gtk.Justification.Left
 self.window.Title = ''
 self.usemarkup()

 def dateselect(self, widget, event):
 self.getcaltext()
 self.datelabel.Text = self.caltext
 self.datedifflabel.Text = self.timedifftext
 self.prettyup()

 def DeleteEvent(self, widget, event):
 Gtk.Application.Quit()

if __name__ == '__main__':
 CalendarTest()

Thanks for stopping by.

Monday, October 20, 2014

subprocess.Popen() or Abusing a Home-grown Windows Executable

Each month I redo 3D block model interpolations for a series of open pits at a distant mine. Those of you who follow my twitter feed often see me tweet, "The 3D geologic block model interpolation chuggeth . . ." What's going on is that I've got all the processing power maxed out dealing with millions of model blocks and thousands of data points. The machine heats up and with the fan sounds like a DC-9 warming up before flight.

All that said, running everything roughly in parallel is more efficient time-wise than running it sequentially. An hour of chugging is better than four. The way I've been doing this is using the Python (2.7) subprocess module's Popen method, running my five interpolated values in parallel. Our Python programmer Lori originally wrote this to run in sequence for a different set of problems. I bastardized it for my own.

The subprocess part of the code is relatively straightforward. Function startprocess() in my code covers that.

What makes this problem a little more challenging:

1) it's a vendor supplied executable we're dealing with . . . without an API or source . . . that's interactive (you can't feed it the config file path; it asks for it). This results in a number of time.sleep() and <process>.stdin.write() calls that can be brittle.

2) getting the processes started, as I just mentioned, is easy. Finding out when to stop, or kill them, requires knowledge of the app and how it generates output. I've gone for an ugly, but effective check of report file contents.

3) while waiting for the processes to finish their work, I need to know things are working and what's going on. I've accomplished this by reporting the data files' sizes in MB.

4) the executable isn't designed for a centralized code base (typically all scripts are kept in a folder for the specific project or pit), so it only allows about 100 character columns in the file paths sent to it. I've omitted this from my sanitized version of the code, but it made things even messier than they are below. Also, I don't know if all Windows programs do this, but the paths need to be inside quotes - the path kept breaking on the colon (:) when not quoted.

Basically, this is a fairly ugly problem and a script that requires babysitting while it runs. That's OK; it beats the alternative (running it sequentially while watching each run). I've tried to adhere to DRY (don't repeat yourself) as much as possible, but I suspect this could be improved upon.

The reason why I blog it is that I suspect there are other people out there who have to do the same sort of thing with their data. It doesn't have to be a mining problem. It can be anything that requires intensive computation across voluminous data with an executable not designed with a Python API.

Notes:

1) I've omitted the file multirunparameters.py that's in an import statement. It has a bunch of paths and names that are relevant to my project, but not to the reader's programming needs.

2) python 2.7 is listed at the top of the file as "mpython." This is the Python that our mine planning vendor ships that ties into their quite capable Python API. The executable I call with subprocess.Popen() is a Windows executable provided by a consultant independent of the mine planning vendor. It just makes sense to package this interpolation inside the mine planning vendor's multirun (~ batch file) framework as part of an overall working of the 3D geologic block model. The script exits as soon as this part of the batch is complete. I've inserted a 10 second pause at the end just to allow a quick look before it disappears.

#!C:/MineSight/x64/mpython

"""
Interpolate grades with <consultant> program
from text files.
"""

import argparse
import subprocess as subx
import os
import collections as colx
import time
from datetime import datetime as dt

# Lookup file of constants, pit names, assay names, paths, etc.
import multirunparameters as paramsx

parser = argparse.ArgumentParser()
# 4 letter argument like 'kwat'
# Feed in at command line.
parser.add_argument('pit', help='four letter, lower case pit abbreviation (kwat)', type=str)
args = parser.parse_args()
PIT = args.pit

pitdir = paramsx.PATHS[PIT]
pathx = paramsx.BASEPATH.format(pitdir)
controlfilepathx = paramsx.CONTROLFILEPATH.format(pitdir)

timestart = dt.now()
print(timestart)

PROGRAM = 'C:/MSPROJECTS/EOMReconciliation/2014/Multirun/AllPits/consultantprogram.exe'

ENDTEXT = 'END <consultant> REPORT'

# These names are the only real difference between pits.
# Double quote is for subprocess.Popen object's stdin.write method
# - Windows path breaks on colon without quotes.
ASSAY1DRIVER = 'KDriverASSAY1{:s}CBT.csv"'.format(PIT)
ASSAY2DRIVER = 'KDriverASSAY2{:s}CBT.csv"'.format(PIT)
ASSAY3DRIVER = 'KDriverASSAY3_{:s}CBT.csv"'.format(PIT)
ASSAY4DRIVER = 'KDriverASSAY4_{:s}CBT.csv"'.format(PIT)
ASSAY5DRIVER = 'KDriverASSAY5_{:s}CBT.csv"'.format(PIT)

RETCHAR = '\n'

ASSAY1 = 'ASSAY1'
ASSAY2 = 'ASSAY2'
ASSAY3 = 'ASSAY3'
ASSAY4 = 'ASSAY4'
ASSAY5 = 'ASSAY5'

NAME = 'name'
DRFILE = 'driver file'
OUTPUT = 'output'
DATFILE = 'data file'
RPTFILE = 'report file'

# data, report files
ASSAY1K = 'ASSAY1K.csv'
ASSAY1RPT = 'ASSAY1.RPT'
ASSAY2K = 'ASSAY2K.csv'
ASSAY2RPT = 'ASSAY2.RPT'
ASSAY3K = 'ASSAY3K.csv'
ASSAY3RPT = 'ASSAY3.RPT'
ASSAY4K = 'ASSAY4K.csv'
ASSAY4RPT = 'ASSAY4.RPT'
ASSAY5K = 'ASSAY5K.csv'
ASSAY5RPT = 'ASSAY5.RPT'

OUTPUTFMT = '{:s}output.txt'

ASSAYS = {1:{NAME:ASSAY1,
 DRFILE:controlfilepathx + ASSAY1DRIVER,
 OUTPUT:pathx + OUTPUTFMT.format(ASSAY1),
 DATFILE:pathx + ASSAY1K,
 RPTFILE:pathx + ASSAY1RPT},
 2:{NAME:ASSAY2,
 DRFILE:controlfilepathx + ASSAY2DRIVER,
 OUTPUT:pathx + OUTPUTFMT.format(ASSAY2),
 DATFILE:pathx + ASSAY2K,
 RPTFILE:pathx + ASSAY2RPT},
 3:{NAME:ASSAY3,
 DRFILE:controlfilepathx + ASSAY3DRIVER,
 OUTPUT:pathx + OUTPUTFMT.format(ASSAY3),
 DATFILE:pathx + ASSAY3K,
 RPTFILE:pathx + ASSAY3RPT},
 4:{NAME:ASSAY4,
 DRFILE:controlfilepathx + ASSAY4DRIVER,
 OUTPUT:pathx + OUTPUTFMT.format(ASSAY4),
 DATFILE:pathx + ASSAY4K,
 RPTFILE:pathx + ASSAY4RPT},
 5:{NAME:ASSAY5,
 DRFILE:controlfilepathx + ASSAY5DRIVER,
 OUTPUT:pathx + OUTPUTFMT.format(ASSAY5),
 DATFILE:pathx + ASSAY5K,
 RPTFILE:pathx + ASSAY5RPT}}

DELFILE = 'delete file'
INTERP = 'interp'
SLEEP = 'sleep'
MSGDRIVER = 'message driver'
MSGRETCHAR = 'message return character'
FINISHED1 = 'finished one assay'
FINISHEDALL = 'finished all interpolations'
TIMEELAPSED = 'time elapsed'
FILEEXISTS = 'report file exists'
DATSIZE = 'data file size'
DONE = 'number interpolations finished'
DATFILEEXIST = 'data file not yet there'
SIZECHANGE = 'report file changed size'

# for converting to megabyte file size from os.stat()
BITSHIFT = 20
# sleeptime - 5 seconds
SLEEPTIME = 5
FINISHED = 'finished'
RPTFILECHSIZE = """

Report file for {:s}
changed size; killing process . . .
"""

MESGS = {DELFILE:'\n\nDeleting {} . . .\n\n',
 INTERP:'\n\nInterpolating {:s} . . .\n\n',
 SLEEP:'\nSleeping 2 seconds . . .\n\n',
 MSGDRIVER:'\n\nWriting driver file name to stdin . . .\n\n',
 MSGRETCHAR:'\n\nWriting retchar to stdin for {:s} . . .\n\n',
 FINISHED1:'\n\nFinished {:s}\n\n',
 FINISHEDALL:'\n\nFinished interpolation.\n\n',
 TIMEELAPSED:'\n\n{:d} elapsed seconds\n\n',
 FILEEXISTS:'\n\nReport file for {:s} exists . . .\n\n',
 DATSIZE:'\n\nData file size for {:s} is now {:d}MB . . .\n\n',
 DONE:'\n\n{:d} out of {:d} assays are finished . . .\n\n',
 DATFILEEXIST:"\n\n{:s} doesn't exist yet . . .\n\n",
 SIZECHANGE:RPTFILECHSIZE}

def cleanslate():
 """
 Delete all output files prior to interpolation
 so that their existence can be tracked.
 """
 for key in ASSAYS:
 files = (ASSAYS[key][DATFILE],
 ASSAYS[key][RPTFILE],
 ASSAYS[key][OUTPUT])
 for filex in files:
 print(MESGS[DELFILE].format(filex))
 if os.path.exists(filex) and os.path.isfile(filex):
 os.remove(filex)
 return 0

def startprocess(assay):
 """
 Start <consultant program> run for given interpolation.
 Return subprocess.Popen object,
 file object (output file).
 """
 print(MESGS[INTERP].format(ASSAYS[assay][NAME]))
 # XXX - I hate time.sleep - hack
 # XXX - try to re-route standard output so that
 # it's not all jumbled together.
 print(MESGS[SLEEP])
 time.sleep(2)
 # output file for stdout
 f = open(ASSAYS[assay][OUTPUT], 'w')
 procx = subx.Popen('{0}'.format(PROGRAM), stdin=subx.PIPE, stdout=f)
 print(MESGS[SLEEP])
 time.sleep(2)
 # XXX - problem, starting up Excel CBT 22JUN2014
 # Ah - this is what happens when the <software usb licence>
 # key is not attached :-(
 print(MESGS[MSGDRIVER])
 print('\ndriver file = {:s}\n'.format(ASSAYS[assay][DRFILE]))
 procx.stdin.write(ASSAYS[assay][DRFILE])
 print(MESGS[SLEEP])
 time.sleep(2)
 # XXX - this is so jacked up -
 # no idea what is happening when
 print(MESGS[MSGRETCHAR].format(ASSAYS[assay][NAME]))
 procx.stdin.write(RETCHAR)
 print(MESGS[SLEEP])
 time.sleep(2)
 print(MESGS[MSGRETCHAR].format(ASSAYS[assay][NAME]))
 procx.stdin.write(RETCHAR)
 print(MESGS[SLEEP])
 time.sleep(2)
 return procx, f

def crosslookup(assay):
 """
 From assay string, get numeric
 key for ASSAYS dictionary.
 Returns integer.
 """
 for key in ASSAYS:
 if assay == ASSAYS[key][NAME]:
 return key
 return 0

def checkprocess(assay, assaydict):
 """
 Check to see if assay
 interpolation is finished.
 assay is the item in question
 (ASSAY1, ASSAY2, etc.).
 assaydict is the operating dictionary
 for the assay in question.
 Returns True if finished.
 """
 # Report file indicates process finished.
 assaykey = crosslookup(assay)
 rptfile = ASSAYS[assaykey][RPTFILE]
 datfile = ASSAYS[assaykey][DATFILE]
 if os.path.exists(datfile) and os.path.isfile(datfile):
 # Report size of file in MB.
 datfilesize = os.stat(datfile).st_size >> BITSHIFT
 print(MESGS[DATSIZE].format(assay, datfilesize))
 else:
 # Doesn't exist yet.
 print(MESGS[DATFILEEXIST].format(datfile))
 if os.path.exists(rptfile) and os.path.isfile(rptfile):
 # XXX - not the most efficient way,
 # but this checking the file appears
 # to work best.
 f = open(rptfile, 'r')
 txt = f.read()
 f.close()
 # XXX - hack - gah.
 if txt.find(ENDTEXT) > -1:
 # looking for change in reportfile size
 # or big report file
 print(MESGS[SIZECHANGE].format(assay))
 print(MESGS[SLEEP])
 time.sleep(2)
 return True
 return False

PROCX = 'process'
OUTPUTFILE = 'output file'

# Keeps track of files and progress of <consultant program>.
opdict = colx.OrderedDict()

# get rid of preexisting files
cleanslate()

# start all five roughly in parallel
# ASSAYS keys are numbers
for key in ASSAYS:
 # opdict - ordered with assay names as keys
 namex = ASSAYS[key][NAME]
 opdict[namex] = {}
 assaydict = opdict[namex]
 assaydict[PROCX], assaydict[OUTPUTFILE] = startprocess(key)
 # Initialize active status of process.
 assaydict[FINISHED] = False

# For count.
numassays = len(ASSAYS)
# Loop until all finished.
while True:
 # Cycle until done then break.
 # Sleep SLEEPTIME seconds at a time and check between.
 time.sleep(SLEEPTIME)
 # Count.
 i = 0
 for key in opdict:
 assaydict = opdict[key]
 if not assaydict[FINISHED]:
 status = checkprocess(key, assaydict)
 if status:
 # kill process when report file changes
 opdict[key][PROCX].kill()
 assaydict[FINISHED] = True
 i += 1
 else:
 i += 1
 print(MESGS[DONE].format(i, numassays))
 # all done
 if i == numassays:
 break

print('\n\nFinished interpolation.\n\n')
timeend = dt.now()
elapsed = timeend - timestart

print(MESGS[TIMEELAPSED].format(elapsed.seconds))
print('\n\n{:d} elapsed minutes\n\n'.format(elapsed.seconds/60))

# Allow quick look at screen.
time.sleep(10)

pyright