Sunday, November 16, 2014

Polygon Offset With pyeuclid Revisited

A few years back I did two or three posts on polygon offset.  It was a learning experience that I never quite completed to my satisfaction.  A kind visitor to my last post on the subject, Mr. Ahmad Rafsanjani, actually rewrote some of my code in a comment.  I gave him a polite weasel answer thanking him, but dropped the effort and never felt quite right about it.

Well, as the saying goes, better late than never.  He was quite correct in his assessment, but my understanding of vector math was not strong enough to prove this to myself.  I was visually inspecting the results, and, given what I was dealing with at the time, they seemed OK.

Here is the picture we're trying to get (this is with Mr. Rafsanjani's code, but the difference with mine and the original code, although wrong, is not great):

In order to nail down the discrepancy in my original code, I inserted some print statements with a lot of numeric precision (28 digits to the right of the decimal) in the output:

$ more points
1.2231671842700024832595318003      1.7024195134850139687898717966
2.1231671842700023944416898303      1.7024195134850139687898717966
2.2768328157299975167404681997      2.5475804865149860312101282034
1.6635803619063778135966913396      2.5493839809701555054743948858
1.7364196380936223196300716154      3.3506160190298444057077631442
2.5205825797292722434406186949      3.3529128986463621053815131745
2.6794174202707274901058553951      4.1470871013536383387076966756
2.1360193516544989655869812850      4.1228847880778562995374159073


The numbers highlighted in yellow are mismatches in the Y-coordinates of points of the inset offset polygon - each pair of Y coordinates should represent lines parallel to the X axis; in other words, they should be equal.  I have a bug.

Contrast that with the numbers yielded by Mr. Rafsanjani's code:

$ more points
1.2251864530113494300422871675      1.7000000000000001776356839400
2.1251864530113491191798402724      1.7000000000000001776356839400
2.2797319075568038826418160170      2.5499999999999998223643160600
1.6642549229616445671808833140      2.5499999999999998223643160600
1.7369821956889173186766583967      3.3500000000000000888178419700
2.5229705854077835169846366625      3.3500000000000000888178419700
2.6829705854077836590931838145      4.1500000000000003552713678801
2.1880983342360056376207921858      4.1500000000000003552713678801
2.6780983342360054066944030637      4.8499999999999996447286321199
3.1219016657639944156699129962      4.8499999999999996447286321199


Much better.  Lines that are supposed to be perfectly parallel to the X axis are, at least to 28 decimal places precision and the limits of my platform and the C Python interpreter, parallel to the X axis.  For what I am doing, I can more than live with that.

I've included Mr. Rafsanjani's comments in the code.  My modifications to his code were mainly for the purpose of printing some things out and organizing the polygon offset part of this exercise into a module.

I've made a separate main script for gnuplot.  After not looking at everything for three years I realized I had forgotten everything I ever knew about gnuplot and wanted to record it this time.  The file with the 20 points for the shape ( is available on request.

Here is the main pyeuclid/polygon offset part of the code (

Polygon offset problem using
pyeuclid and incorporating corrections
made by Ahmed Rafsanjani.

# Mr. Rafsanjani's comments:

# I think there is a small bug:

# In "getinsetpoint", the vector v3 should be
# normalized before passing to "scaleadd".

# Furthermore, the final offset is not as the
# prescribed OFFSET and the angle between
# vectors should be taken into account.

# A possible solution could be:

import euclid as eu
import math
import copy

import monastery as pic

OFFSET = 0.15

def scaleadd(origin, offset, vectorx):
    From a vector representing the origin,
    a scalar offset, and a vector, returns
    a Vector3 object representing a point
    offset from the origin.

    (Multiply vectorx by offset and add to origin.)
    multx = vectorx * offset
    return multx + origin

def getinsetpoint(pt1, pt2, pt3):
    Given three points that form a corner (pt1, pt2, pt3),
    returns a point offset distance OFFSET to the right
    of the path formed by pt1-pt2-pt3.
    pt1, pt2, and pt3 are two tuples.
    Returns a Vector3 object.
    origin = eu.Vector3(pt2[0], pt2[1], 0.0)
    v1 = eu.Vector3(pt1[0] - pt2[0], pt1[1] - pt2[1], 0.0)
    v2 = eu.Vector3(pt3[0] - pt2[0], pt3[1] - pt2[1], 0.0)
    v3 = copy.copy(v1)
    v1 = v1.cross(v2)
    v3 += v2
    cs =
    a1 = cs * v2
    a2 = v3 - a1
    if cs > 0:
        alpha = math.sqrt(a2.magnitude_squared())
        alpha =- math.sqrt(a2.magnitude_squared())
    if v1.z < 0.0:
        return scaleadd(origin, -1.0 * OFFSET/alpha, v3)
        return scaleadd(origin, OFFSET/alpha, v3)

def generatepoints():
    Create list of offset points
    (pyeuclid.Vector3 objects) for
    points inset from polygon.

    Return list.
    polyinset = []
    lenpolygon = len(pic.MONASTERY)
    i = 0
    poly = pic.MONASTERY
    while i < lenpolygon - 2:
                     poly[i + 1], poly[i + 2]))
        i += 1
                 poly[0], poly[1]))
                 poly[1], poly[2]))

    return polyinset

The file that prints stuff out and summons gnuplot (

Write vector points to file.

Show in gnuplot.

# import blogpost as vecx
import rafsanjanicorrection as vecx
import os

# We're using gnuplot.
# It doesn't like commas, so
# we'll use whitespace (6).
FMT = '{0:30.28f}      {1:30.28f}'
FILEX = 'points'
ORIGSHAPE = 'originalshape'

PLOTCMD = 'set xrange[0.0:6.0]\n'
PLOTCMD += 'set yrange[0.0:6.0]\n'
PLOTCMD += 'plot "{0:s}" with lines lt rgb "red" lw 4, '
PLOTCMD += '"{1:s}" with lines lt rgb "blue" lw 4'
GNUPLOTFILE = 'plotfile'
GNUPLOT = 'gnuplot -p {:s}'.format(GNUPLOTFILE)

pts = vecx.generatepoints()
f = open(FILEX, 'w')
i = 1
for ptx in pts:
    print('Printing point {0:d} . . .'.format(i))
    print >> f, FMT.format(ptx.x, ptx.y)
    i += 1
# Plot original as well.
# XXX - repetetive - make function.
i = 0
f = open(ORIGSHAPE, 'w')
for ptx in vecx.pic.MONASTERY:
    print('Printing point {0:d} of original shape . . .'.format(i))
    print >> f, FMT.format(ptx[0], ptx[1])
    i += 1

f = open(GNUPLOTFILE, 'w')
print >> f, PLOTCMD.format(ORIGSHAPE, FILEX)

pyeuclid, to the best of my knowledge, runs only in Python 2.7 at the moment.  In any case, I got an error on the Python 3.4 install with so I stuck with 2.7.

Thanks to Mr. Rafsanjani for his help with this and for the rest of you for stopping by.

Monday, November 3, 2014

MeetBSD California 2014 Recap

I am returning from MeetBSD in San Jose, California.  This isn't a Python-related post per se, but the BSD family of operating systems maintains packages and ports for Python and Python third party libraries, and use of Python on these systems is significant both in the open source development and commercial spheres.

The structure of the conference is a brief weekend unconference.  Nonetheless some of the talks were more than worthy of a full fledged mega-con, and the rest were quality.  It was a good deal.

Venue:  the conference was held at Western Digital.   WD sells a variety of hardware.  The product they were pushing was a several terabyte little box that updates wirelessly (but not by Bluetooth).

We met in a rectangular conference room.  All of Silicon Valley seems to me to be an endless office park with nice weather and some landscaped spots (I've included the obligatory Strelizia/bird of paradise pic from the conference hotel entrance below).  It was a fairly intimate setting.  The food (a variety of sandwiches) was good.  We were warned ahead of time that Wifi was limited; I brought my own Verizon jetpack unit so it wasn't an issue for me.

Talks (that I attended):

1) Rick Reed, “WhatsApp: Half a billion unsuspecting FreeBSD users” - Erlang and FreeBSD at WhatsApp used for scaling.  Now 600,000 users.  It was a good talk, but I wasn't awake and some of it went over my head.

2) Jordan Hubbard, “FreeBSD: The Next 10 Years” Good talk; I hated it :-(

Hubbard's leaving Apple a couple years ago and signing on with iXSystems (a sponsor and essentially the organizer of this conference) made a big splash.  He is an accomplished dev and a good guy by all accounts.  His ideas are on many levels very valid in every sense.

I am primarily an OpenBSD user.  I run FreeBSD on my RPi and on a spare laptop for easy access to Java.  The two OS's have similar philosophies in some respects (correctness, BSD license, etc.).  There is cross-polination when it comes to operating system components, apps, and drivers.  But where OpenBSD unapologetically maintains new releases for older hardware and uncompromisingly adheres to its leader's approach to security and development, FreeBSD in the framework of Hubbard's talk is looking more towards the future and making changes to attract younger talented core committers and target more modern (read mobile) platforms.  Telemetry, scrapping development on older platforms "ruthlessly," getting younger devs involved by providing work that's interesting to them - all this stuff is important for FreeBSD going forward.  At one point he even <gasp> suggested systemd as a good strategy for Linux that FreeBSD should, at least in principle if not in form, emulate.

FreeBSD is everywhere - or at least in a lot of places companies just don't make a big deal of.  Inside cable (connections) was the one example.  In order to accomodate mobile and embedded environments, the OS, although well suited to these platforms now, needs to change.

A lot of this in my mind goes against OpenBSD's philosophy - purity and security at all costs.  My personal philosophy lies with the OpenBSD approach, but I may well be wrong.  Hubbard is a guy with a lot of industry know how and experience and I am a geologist who uses OpenBSD.  He is probably right, but I don't want my fun to stop, so I'm sticking with OpenBSD even if death awaits us . . .

3) David Maxwell, "The Unix command pipeline - using Unix in the renewable energy era"

I always liked Maxwell.  He's a Canadian guy and a NetBSD devotee.

His talk was about a command line app he's putting together for better tracking piped commands on the UNIX command line and reproducing, referencing, and inspecting them retroactively in a way that's easier than what you have to do now.  I think it's got potential and would like to see it succeed.

After the angst I felt over Hubbard's talk, this was a welcome relief.  The UNIX command line is something everyone, or most everyone at the con knows and loves.  Everyone uses piped commands.  This is a useful approach to a common problem - that's something we can all agree on.  My favorite talk of the conference (that I attended).

4) Alex Rosenberg, "Meet PlayStation 4"

By far and away the coolest talk.  Rosenberg presented this well and spoke honestly and as openly as he could as a member of a big commercial project about specifics.  Games require so much optimization at such a low level.  Although this theme came up in a number of the talks, on the PlayStation project it's critical.  Essentially, the best hardware and hardware architecture for the project is selected for a given product lifecycle (10 years?  IIRC) then you hammer at it with software modifications to get every last bit of efficiency out of it.

It's not like there's a standard laptop install of FreeBSD on PlayStation 4 and you let it rip with your happy traditional UNIX OS. They're optimizing LLVM and clang (the compiler and linkers), talking directly to the metal as much as possible, and just generally nailing performance at the lowest level of the architecture (after they've gotten the low hanging fruit up top, of course).

Another theme that came up in almost all the talks, but especially in this one, was the BSD license.  Granted, it was a BSD conference, so organizers and attendees have a bias.  Nonetheless, it appears that licensing is really critical in the decision to adopt open source software and operating systems.  "business friendly" nowadays often has "capitalism at its worst" overtones, still, it was a theme:  the BSD license is the "business friendly" one whereas the GPL, particularly the GPL3, is not . . .

I'm not a gamer, but I enjoyed this.  Rosenberg is really easy to talk to as well.  He let me take that pic up close when we were posing for the group pic after his talk.

5) Brendan Gregg, "Performance Analysis"

Gregg works for Netflix.  He's written a lot of dtrace scripts (including numerous Python ones) and has them readily available on Github.

I found myself wishing I knew more about the subject, because performance monitoring is a really cool netadmin problem when, like Netflix, you're dealing with huge bandwidth challenges (as in other talks, so much comes down to optimization).

That said, Gregg presented some graphical tools that are useful (I'll get the names wrong, so I won't try) - basically histogram-like, color coded performance charts with labels for processes.  You don't have to run your own netflix to benefit from these and he's made everything open source and available.  If I were a netadmin I would jump on this.  I've got to get smarter first before I can benefit from these tools.

Gregg has a soft British accent and a very amiable demeanor.  He was the first talk in the morning.  It was like a lullabye.  This is one I need to revisit on the videos posted online because it's worth it.

6) Corey Vixie, "Web Apps on Embedded BSD..."

The iXSystems surprise talk, but a good one.  The youngster Vixie briefed us a bit on what iXSystems is doing with web presentation layer (for lack of a better description) of the FreeNAS implementation.

He started off by saying static web pages are, at least for apps like FreeNAS, not the way to go anymore.  Refreshing the DOM (Document Object Model) at regular intervals is not going to work well.  He then introduced us to a number of mature and nascent JavaScript/web technologies, some of which no one in the room had yet heard of.  Basically he had to rewrite the "old" Django/other technologies implementation to accomodate better simulation of a desktop app in the browser.

The specifics were not something I could follow well because of my ignorance.  There was talk of an Open Source, BSD licensed Facebook framework whose name I can't recall, a one-way change propagation architecture for updating the dynamic web page, and, as always, optimization of the process.  I asked him about Django after the talk.  He said it was the best thing a couple years ago for this app, but now they needed something that could interact directly with the browser - namely JavaScript - it comes down to fine-grained control and optimization.

One humorous interlude during the Q & A was my asking him if he was indeed related to Paul Vixie, historical UNIX tools author (Vixie Cron), to which he replied, "This is the part of my talk where I say, 'I am Worf, son of Mogh.'"  Anyone with a sense of humor and a knowledge of STTNG can't be all bad ;-)

A few people pics:

Dru Lavigne.  Without the BSDA cert program she helped found, I would never have gotten over the hump learning UNIX.  We differ on our choice of specific BSD, but I still consider her my UNIX mentor.

iXSystems old timers Denise and Matt working out conference specifics.

FreeBSD Foundation rep Anne.

Conclusion:  MeetBSD is an affordable, pretty meaty con if you like UNIX, hardware, and topics about optimization and scale.  It is, fortunately or unfortunately, a pretty well kept secret.

Thanks for stopping by.

Friday, October 31, 2014

Gtk.TreeView (grid view) with mono, gtk-sharp, and IronPython

The post immediately prior to this one was an attempt to reproduce Windows.Forms Calendar controls in Gtk for cross platform (Windows/*nix) effective rendering.

This time I am attempting to get familiar with gtk-sharp/Gtk's version of a grid view - the Gtk.TreeView object.  Some of the gtk-sharp documentation suggests the NodeView object would be easier to use.  I had some trouble instantiating the objects associated with the NodeView and went with the TreeView instead in the hopes of getting more control.

The Windows.Forms GridView I did years ago is here.  It became apparent to me shortly after embarking on this journey that I would be hard pressed to recreate all the functionality of that script in a timely manner.  I settled for a tabular view of drillhole data (fabricated, mock data) with some custom formatting.

Aside:  this is typically how mineral exploration drillhole data (core, reverse circulation drilling) is presented in tabular format - a series of from-to intervals with assay values.  Assuming the assays are all separate elements, the reported weight percents should not sum more than 100%, and never do unless someone fat fingers a decimal place.  I've projected a couple screaming hot polymetallic drill holes that end near surface (lack of funding for drilling), but show enough promise that the new mining town of Trachteville (the drill hole name CBT-BNZA stands for CBT-Bonanza) will spring up there at any moment . . . one can dream.

The data store object for the grid view Gtk.ListStore object would not instantiate in IronPython.  I was not the only person to have experienced this problem (I cannot locate the link to the mailing list thread or forum reference, but like the big fish that got away, I swear I saw it).  I didn't want to drop the effort just because of that, so I hacked and compiled some C# code:

public class storex
    public Gtk.ListStore drillhole =
                            // 7 columns
                            // drillhole id
          new Gtk.ListStore (typeof (string),
                            // from
                            typeof (double),
                            // to
                            typeof (double),
                            // assay1
                            typeof (double),
                            // assay2
                            typeof (double),
                            // assay3
                            typeof (double),
                            // assay4
                            typeof (double));

The mono command on Windows was

C:\UserPrograms\Mono-3.2.3>mcs -pkg:gtk-sharp-2.0 /target:library C:\UserPrograms\IronPythonGUI\storex.cs

Those are my file paths; locations depend on where you install things like mono and IronPython.

Anyway, I got my dll and I was off to the races.  Getting to know the Gtk and gtk-sharp object model proved challenging for me.  I'm glad I got some familiarity with it, but it would take me longer to do something in Gtk than it did with Windows.Forms.  The most fun and gratifying part of the project was getting the custom formatting to work with a Gtk.TreeCellDataFunc.  I used a function that yielded specific functions for each column - something that's really easy to do in Python.

Anyway, here are a couple screenshots and the IronPython code:

The OpenBSD one below turned out pretty good, but the Windows one had a little double line underneath the first row - it looked as though it was still trying to select that row when I told it specifically not to.  I'm not a design perfectionist Steve Jobs type, but niggling nits like that drive me batty.  For now, though it's best I publish the code and move on.

#!/usr/local/bin/mono /home/carl/IronPython-2.7.4/ipy64.exe

import clr

GTKSHARP = 'gtk-sharp'
PANGO = 'pango-sharp'

# Mock store C#
STOREX = 'storex'


# C# module compiled for this project.
# Problems with Gtk.ListStore in IronPython.

import Gtk
import Pango

import storex

TITLE = 'Gtk.TreeView Demo (Drillholes)'
MARKUP = '<span font="Courier New" size="14" weight="bold">{:s}</span>'

RIGHT = 1.0


COURFONTREGULAR = 'Courier New 12'
COURFONTBOLD = 'Courier New Bold 12'

DHNAMELABEL = 'drillhole'
FROM = 'from'
TO = 'to'
ASSAY1 = 'assay1'
ASSAY2 = 'assay2'
ASSAY3 = 'assay3'
ASSAY4 = 'assay4'

FP1FMT = '{:>5.1f}'
FP2FMT = '{:>4.2f}'

DHDATAX = {(DHNAME.format(1), 0.0):{TO:8.7,
           (DHNAME.format(1), 8.7):{TO:15.3,
           (DHNAME.format(1), 15.3):{TO:25.3,
           (DHNAME.format(2), 0.0):{TO:10.0,
           (DHNAME.format(2), 10.0):{TO:20.0,



def genericfloatformat(floatfmt, index):
    For cell formatting in Gtk.TreeView.

    Returns a function to format floats
    and to format floats' foreground color
    based on cutoff value.

    floatfmt is a format string.

    index is an int that indicates the
    column being formatted.
    def setfloatfmt(treeviewcolumn, cellrenderer, treemodel, treeiter):
        cellrenderer.Text = floatfmt.format(treemodel.GetValue(treeiter, index))
        # If it is one of the assay value columns.
        # XXX - not generic.
        if index > 2:
            if treemodel.GetValue(treeiter, index) > BLAZINGCUTOFF:
                cellrenderer.Foreground = 'red'
                cellrenderer.Foreground = 'black'
    return Gtk.TreeCellDataFunc(setfloatfmt)

class TreeViewTest(object):
    def __init__(self):
        self.window = Gtk.Window('')
        # DeleteEvent - copied from Gtk demo on internet.
        self.window.DeleteEvent += self.DeleteEvent
        # Frame property provides a frame and title.
        self.frame = Gtk.Frame(MARKEDUPTITLE)
        self.tree = Gtk.TreeView()
        self.tree.EnableGridLines = Gtk.TreeViewGridLines.Both

        # Fonts for formatting.
        self.fdregular = Pango.FontDescription.FromString(COURFONTREGULAR)
        self.fdbold = Pango.FontDescription.FromString(COURFONTBOLD)

        # C# module = storex().drillhole

        self.tree.Model =


        # Keep text viewable - size no smaller than intended.
        self.window.AllowShrink = False
        # XXX - hack to keep lack of gridlines on edges of
        #       table from showing.
        self.window.AllowGrow = False
        # Unselect everything for this demo.

    def makecolumns(self):
        Fill in columns for TreeView.
        self.columns = {}
        for fieldx in FIELDS:
            self.columns[fieldx] = Gtk.TreeViewColumn()
            self.columns[fieldx].Title = fieldx

    def formatcolumns(self):
        Make custom labels for columnn headers.

        Get each column properly justified (all
        are right justified,floating point numbers
        except for the drillhole 'number' -
        actually a string).
        self.customlabels = {}

        for fieldx in FIELDS:
            # This centers the labels at the top.
            self.columns[fieldx].Alignment = CENTERED
            self.customlabels[fieldx] = Gtk.Label(self.columns[fieldx].Title)
            # 120 is about right for from, to, and assay columns.
            self.columns[fieldx].MinWidth = 120
            self.columns[fieldx].Widget = self.customlabels[fieldx]
            # ShowAll required for new label to take.

    def formatcells(self):
        Add and format cell renderers.
        self.cellrenderers = {}

        for fieldx in FIELDS:
            self.cellrenderers[fieldx] = Gtk.CellRendererText()
            self.columns[fieldx].PackStart(self.cellrenderers[fieldx], True)
            # Drillhole 'number' (string)
            if fieldx == FIELDS[0]:
                self.cellrenderers[fieldx].Xalign = CENTERED
                        'text', 0)
                self.cellrenderers[fieldx].Xalign = RIGHT
                            'text', FIELDS.index(fieldx))
                except ValueError:
                    print('\n\nProblem with field definitions; field not found.\n\n')
        for fieldx in BOLDEDCOLUMNS:
            self.cellrenderers[fieldx].Font = COURFONTBOLD

        # XXX - not very generic, but better than doing them one by one.
        # from, to columns.
        for x in xrange(1, 3):
                    genericfloatformat(FP1FMT, x))
        # assay<x> columns.
        for x in xrange(3, 7):
                    genericfloatformat(FP2FMT, x))

    def usemarkup(self):
        Refreshes UseMarkup property on widgets (labels)
        so that they display properly and without
        markup text.
        # Have to refresh this property each time.
        self.frame.LabelWidget.UseMarkup = True

    def prettyup(self):
        Get Gtk objects looking the way we
        # Try to get Courier New on treeview.
        # Get rid of line.
        self.frame.Shadow = Gtk.ShadowType.None

    def adddata(self):
        Put data into store.
        # XXX - difficulty figuring out sorting
        #       function for TreeView.  Hack it
        #       with dictionary here.
        keytuples = [key for key in DHDATAX]
        datax = []
        for tuplex in keytuples:
            # XXX - side effect comprehension.
            #       Not great for readability,
            #       but compact.
            [datax.append(x) for x in tuplex]
            for fieldx in NONKEYFIELDS:
            # Reinitiialize data row list.
            datax = []

    def DeleteEvent(self, widget, event):

if __name__ == '__main__':

Thanks for stopping by.

Thursday, October 30, 2014

Mono gtk-sharp IronPython CalendarView

A number of years ago I did a post on the IronPython Cookbook site about the Windows.Forms Calendar control.  I could never get the thing to render nicely on *nix operating systems (BSD family).  It sounds as though Windows.Forms development for mono (and in general) is kind of dead, so there is not much hope that solution/example will ever render nicely on *nix.  Recently I've been playing with mono and decided to give gtk-sharp a shot with IronPython.

Quick disclaimers:

1) I suspect from the examples I've seen on the internet that PyGtk is a little easier to deal with than gtk-sharp.  That's OK; I wanted to use IronPython and have the rest of the mono/dotNet framework available, so I went through the extra trouble to forego CPython and PyGtk and go with IronPython and gtk-sharp instead.

2) The desktop is not the most cutting edge or sexy platform in 2014.  Nonetheless, where I work it is alive and well.  When I no longer see engineers hacking solutions in Excel and VBA, I'll consider the possibility of outliving the desktop.  Right now I'm not hopeful :-\

The results aren't bad, at least as far as rendering goes.  I couldn't get the Courier font to take on OpenBSD, but the Gtk Calendar control looks acceptable.  All in all, I was OK with the results on both Windows and OpenBSD.  I've heard Gtk doesn't do quite as well on Apple products, but I don't own a Mac to test with.  Here are a couple screenshots:

I run the cwm window manager on OpenBSD and have it set up to cut out borders on windows, hence the more minimalist look to the control there.

IronPython output on *nix has always come out in yellow or white - it doesn't show up on a white background, which I prefer.  In order to get around this, I run an xterm with a black background:

xterm -bg black -fg white

Here is the code for the gtk-sharp Gtk.Calendar control:

#!/usr/local/bin/mono /home/carl/IronPython-2.7.4/ipy64.exe

import clr

GTKSHARP = 'gtk-sharp'
PANGO = 'pango-sharp'


import Gtk
import Pango

import datetime

TITLE = 'Gtk.Calendar Demo'
MARKUP = '<span font="Courier New" size="14" weight="bold">{:s}</span>'

INFOMSG = '<span font="Courier New 12">\n\n Program set to run for:\n\n '
INFOMSG += '{:%Y-%m-%d}\n\n</span>'

DATEDIFFMSG = '<span font="Courier New 12">\n\n '
DATEDIFFMSG += 'There are {0:d} days between the\n'
DATEDIFFMSG += ' beginning of the epoch and\n'
DATEDIFFMSG += ' {1:%Y-%m-%d}.\n\n</span>'

ALIGNMENTPARAMS = (0.0, 0.5, 0.0, 0.0)


CALENDARFONT = 'Courier New Bold 12'

class CalendarTest(object):
    inthebeginning = datetime.datetime.fromtimestamp(0)
    # Debug info - make sure beginning of epoch really
    #              is +midnight, Jan 1, 1970 GMT.
    def __init__(self):
        self.window = Gtk.Window(TITLE)
        # DeleteEvent - copied from Gtk demo on internet.
        self.window.DeleteEvent += self.DeleteEvent
        # Frame property provides a frame and title.
        self.frame = Gtk.Frame(MARKEDUPTITLE)
        self.calendar = Gtk.Calendar()
        # Handles date selection event.
        self.calendar.DaySelected += self.dateselect
        # Sets up text for labels.
        # Puts little box around text.
        self.datelabelframe = Gtk.Frame()
        # Try to get datelabel to align with other label.
        self.datelabelalignment = Gtk.Alignment(*ALIGNMENTPARAMS)
        self.datelabel = Gtk.Label(self.caltext)
        # Puts little box around text.
        self.datedifflabelframe = Gtk.Frame()
        self.datedifflabelalignment = Gtk.Alignment(*ALIGNMENTPARAMS)
        self.datedifflabel = Gtk.Label(self.timedifftext)
        self.vbox = Gtk.VBox()
        # Keep text viewable - size no smaller than intended.
        self.window.AllowShrink = False

    def getcaltext(self):
        Get messages for run date.
        # Calendar month is 0 based.
        yearmonthday = self.calendar.Year, self.calendar.Month + 1, self.calendar.Day
        chosendate = datetime.datetime(*yearmonthday)
        self.caltext = INFOMSG.format(chosendate)
        # For reporting of number of days since beginning of epoch.
        timediff = chosendate - CalendarTest.inthebeginning
        self.timedifftext = DATEDIFFMSG.format(timediff.days, chosendate)

    def usemarkup(self):
        Refreshes UseMarkup property on widgets (labels)
        so that they display properly and without
        markup text.
        # Have to refresh this property each time.
        self.frame.LabelWidget.UseMarkup = True
        self.datelabel.UseMarkup = True
        self.datedifflabel.UseMarkup = True

    def prettyup(self):
        Get Gtk objects looking the way we
        # Try to make frame wider.
        # XXX
        # Works nicely on Windows - try on Unix.
        # Allows bold, etc.
        self.frame.SetSizeRequest(WINDOWWIDTH, -1)
        # Get rid of line in middle of text on title.
        self.frame.Shadow = Gtk.ShadowType.None
        # Try to get Courier New on calendar.
        fd = Pango.FontDescription.FromString(CALENDARFONT)
        self.datelabel.Justify = Gtk.Justification.Left
        self.datedifflabel.Justify = Gtk.Justification.Left
        self.window.Title = ''

    def dateselect(self, widget, event):
        self.datelabel.Text = self.caltext
        self.datedifflabel.Text = self.timedifftext

    def DeleteEvent(self, widget, event):

if __name__ == '__main__':

Thanks for stopping by. 

Monday, October 20, 2014

subprocess.Popen() or Abusing a Home-grown Windows Executable

Each month I redo 3D block model interpolations for a series of open pits at a distant mine.  Those of you who follow my twitter feed often see me tweet, "The 3D geologic block model interpolation chuggeth . . ."  What's going on is that I've got all the processing power maxed out dealing with millions of model blocks and thousands of data points.  The machine heats up and with the fan sounds like a DC-9 warming up before flight.

All that said, running everything roughly in parallel is more efficient time-wise than running it sequentially.  An hour of chugging is better than four.  The way I've been doing this is using the Python (2.7) subprocess module's Popen method, running my five interpolated values in parallel.  Our Python programmer Lori originally wrote this to run in sequence for a different set of problems.  I bastardized it for my own.

The subprocess part of the code is relatively straightforward.  Function startprocess() in my code covers that.

What makes this problem a little more challenging:

1) it's a vendor supplied executable we're dealing with . . . without an API or source . . . that's interactive (you can't feed it the config file path; it asks for it).  This results in a number of time.sleep() and <process>.stdin.write() calls that can be brittle.

2) getting the processes started, as I just mentioned, is easy.  Finding out when to stop, or kill them, requires knowledge of the app and how it generates output.  I've gone for an ugly, but effective check of report file contents.

3) while waiting for the processes to finish their work, I need to know things are working and what's going on.  I've accomplished this by reporting the data files' sizes in MB.

4) the executable isn't designed for a centralized code base (typically all scripts are kept in a folder for the specific project or pit), so it only allows about 100 character columns in the file paths sent to it.  I've omitted this from my sanitized version of the code, but it made things even messier than they are below.  Also, I don't know if all Windows programs do this, but the paths need to be inside quotes - the path kept breaking on the colon (:) when not quoted.

Basically, this is a fairly ugly problem and a script that requires babysitting while it runs.  That's OK; it beats the alternative (running it sequentially while watching each run).  I've tried to adhere to DRY (don't repeat yourself) as much as possible, but I suspect this could be improved upon.

The reason why I blog it is that I suspect there are other people out there who have to do the same sort of thing with their data.  It doesn't have to be a mining problem.  It can be anything that requires intensive computation across voluminous data with an executable not designed with a Python API.


1) I've omitted the file that's in an import statement.  It has a bunch of paths and names that are relevant to my project, but not to the reader's programming needs.

2) python 2.7 is listed at the top of the file as "mpython."  This is the Python that our mine planning vendor ships that ties into their quite capable Python API.  The executable I call with subprocess.Popen() is a Windows executable provided by a consultant independent of the mine planning vendor.  It just makes sense to package this interpolation inside the mine planning vendor's multirun (~ batch file) framework as part of an overall working of the 3D geologic block model.  The script exits as soon as this part of the batch is complete.  I've inserted a 10 second pause at the end just to allow a quick look before it disappears.


Interpolate grades with <consultant> program
from text files.

import argparse
import subprocess as subx
import os
import collections as colx

import time
from datetime import datetime as dt

# Lookup file of constants, pit names, assay names, paths, etc.
import multirunparameters as paramsx

parser = argparse.ArgumentParser()
# 4 letter argument like 'kwat'
# Feed in at command line.
parser.add_argument('pit', help='four letter, lower case pit abbreviation (kwat)', type=str)
args = parser.parse_args()
PIT = args.pit

pitdir = paramsx.PATHS[PIT]
pathx = paramsx.BASEPATH.format(pitdir)
controlfilepathx = paramsx.CONTROLFILEPATH.format(pitdir)

timestart =

PROGRAM = 'C:/MSPROJECTS/EOMReconciliation/2014/Multirun/AllPits/consultantprogram.exe'

ENDTEXT = 'END <consultant> REPORT'

# These names are the only real difference between pits.
# Double quote is for subprocess.Popen object's stdin.write method
# - Windows path breaks on colon without quotes.
ASSAY1DRIVER = 'KDriverASSAY1{:s}CBT.csv"'.format(PIT)
ASSAY2DRIVER = 'KDriverASSAY2{:s}CBT.csv"'.format(PIT)
ASSAY3DRIVER = 'KDriverASSAY3_{:s}CBT.csv"'.format(PIT)
ASSAY4DRIVER = 'KDriverASSAY4_{:s}CBT.csv"'.format(PIT)
ASSAY5DRIVER = 'KDriverASSAY5_{:s}CBT.csv"'.format(PIT)

RETCHAR = '\n'


NAME = 'name'
DRFILE = 'driver file'
OUTPUT = 'output'
DATFILE = 'data file'
RPTFILE = 'report file'

# data, report files





OUTPUTFMT = '{:s}output.txt'

             DRFILE:controlfilepathx + ASSAY1DRIVER,
             OUTPUT:pathx + OUTPUTFMT.format(ASSAY1),
             DATFILE:pathx + ASSAY1K,
             RPTFILE:pathx + ASSAY1RPT},
             DRFILE:controlfilepathx + ASSAY2DRIVER,
             OUTPUT:pathx + OUTPUTFMT.format(ASSAY2),
             DATFILE:pathx + ASSAY2K,
             RPTFILE:pathx + ASSAY2RPT},
             DRFILE:controlfilepathx + ASSAY3DRIVER,
             OUTPUT:pathx + OUTPUTFMT.format(ASSAY3),
             DATFILE:pathx + ASSAY3K,
             RPTFILE:pathx + ASSAY3RPT},
             DRFILE:controlfilepathx + ASSAY4DRIVER,
             OUTPUT:pathx + OUTPUTFMT.format(ASSAY4),
             DATFILE:pathx + ASSAY4K,
             RPTFILE:pathx + ASSAY4RPT},
             DRFILE:controlfilepathx + ASSAY5DRIVER,
             OUTPUT:pathx + OUTPUTFMT.format(ASSAY5),
             DATFILE:pathx + ASSAY5K,
             RPTFILE:pathx + ASSAY5RPT}}

DELFILE = 'delete file'
INTERP = 'interp'
SLEEP = 'sleep'
MSGDRIVER = 'message driver'
MSGRETCHAR = 'message return character'
FINISHED1 = 'finished one assay'
FINISHEDALL = 'finished all interpolations'
TIMEELAPSED = 'time elapsed'
FILEEXISTS = 'report file exists'
DATSIZE = 'data file size'
DONE = 'number interpolations finished'
DATFILEEXIST = 'data file not yet there'
SIZECHANGE = 'report file changed size'

# for converting to megabyte file size from os.stat()

# sleeptime - 5 seconds

FINISHED = 'finished'
Report file for {:s}
changed size; killing process . . .


MESGS = {DELFILE:'\n\nDeleting {} . . .\n\n',
         INTERP:'\n\nInterpolating {:s} . . .\n\n',
         SLEEP:'\nSleeping 2 seconds . . .\n\n',
         MSGDRIVER:'\n\nWriting driver file name to stdin . . .\n\n',
         MSGRETCHAR:'\n\nWriting retchar to stdin for {:s} . . .\n\n',
         FINISHED1:'\n\nFinished {:s}\n\n',
         FINISHEDALL:'\n\nFinished interpolation.\n\n',
         TIMEELAPSED:'\n\n{:d} elapsed seconds\n\n',
         FILEEXISTS:'\n\nReport file for {:s} exists . . .\n\n',
         DATSIZE:'\n\nData file size for {:s} is now {:d}MB . . .\n\n',
         DONE:'\n\n{:d} out of {:d} assays are finished . . .\n\n',
         DATFILEEXIST:"\n\n{:s} doesn't exist yet . . .\n\n",

def cleanslate():
    Delete all output files prior to interpolation
    so that their existence can be tracked.
    for key in ASSAYS:
        files = (ASSAYS[key][DATFILE],
        for filex in files:
            if os.path.exists(filex) and os.path.isfile(filex):
    return 0

def startprocess(assay):
    Start <consultant program> run for given interpolation.

    Return subprocess.Popen object,
    file object (output file).
    # XXX - I hate time.sleep - hack
    # XXX - try to re-route standard output so that
    #       it's not all jumbled together.
    # output file for stdout
    f = open(ASSAYS[assay][OUTPUT], 'w')
    procx = subx.Popen('{0}'.format(PROGRAM), stdin=subx.PIPE, stdout=f)
    # XXX - problem, starting up Excel CBT 22JUN2014
    #       Ah - this is what happens when the <software usb licence>
    #            key is not attached :-(
    print('\ndriver file = {:s}\n'.format(ASSAYS[assay][DRFILE]))
    # XXX - this is so jacked up -
    #       no idea what is happening when
    return procx, f

def crosslookup(assay):
    From assay string, get numeric
    key for ASSAYS dictionary.

    Returns integer.
    for key in ASSAYS:
        if assay == ASSAYS[key][NAME]:
            return key
    return 0

def checkprocess(assay, assaydict):
    Check to see if assay
    interpolation is finished.

    assay is the item in question
    (ASSAY1, ASSAY2, etc.).

    assaydict is the operating dictionary
    for the assay in question.

    Returns True if finished.
    # Report file indicates process finished.
    assaykey = crosslookup(assay)
    rptfile = ASSAYS[assaykey][RPTFILE]
    datfile = ASSAYS[assaykey][DATFILE]
    if os.path.exists(datfile) and os.path.isfile(datfile):
        # Report size of file in MB.
        datfilesize = os.stat(datfile).st_size >> BITSHIFT
        print(MESGS[DATSIZE].format(assay, datfilesize))
        # Doesn't exist yet.
    if os.path.exists(rptfile) and os.path.isfile(rptfile):
        # XXX - not the most efficient way,
        #       but this checking the file appears
        #       to work best.
        f = open(rptfile, 'r')
        txt =
        # XXX - hack - gah.
        if txt.find(ENDTEXT) > -1:
            # looking for change in reportfile size
            # or big report file
            return True
    return False

PROCX = 'process'
OUTPUTFILE = 'output file'

# Keeps track of files and progress of <consultant program>.
opdict = colx.OrderedDict()

# get rid of preexisting files

# start all five roughly in parallel
# ASSAYS keys are numbers
for key in ASSAYS:
    # opdict - ordered with assay names as keys
    namex = ASSAYS[key][NAME]
    opdict[namex] = {}
    assaydict = opdict[namex]
    assaydict[PROCX], assaydict[OUTPUTFILE] = startprocess(key)
    # Initialize active status of process.
    assaydict[FINISHED] = False

# For count.
numassays = len(ASSAYS)
# Loop until all finished.
while True:
    # Cycle until done then break.
    # Sleep SLEEPTIME seconds at a time and check between.
    # Count.
    i = 0
    for key in opdict:
        assaydict = opdict[key]
        if not assaydict[FINISHED]:
            status = checkprocess(key, assaydict)
            if status:
                # kill process when report file changes
                assaydict[FINISHED] = True
                i += 1
            i += 1
    print(MESGS[DONE].format(i, numassays))
    # all done
    if i == numassays:

print('\n\nFinished interpolation.\n\n')
timeend =
elapsed = timeend - timestart

print('\n\n{:d} elapsed minutes\n\n'.format(elapsed.seconds/60))

# Allow quick look at screen.

Sunday, October 12, 2014

Downloading a Bunch of MP3's off the Internet (Foreign Language Tapes)

A mining bud Jen wrote a blog post lamenting the difficulty of learning a foreign language as an adult in a far off land.  This inspired me to clean up my "download the Foreign Service Institute" French "tapes" (mp3's, actually) script I wrote for myself and publish it.

I'm not very astute on web programming.  This script came out of necessity.  There may be other, more efficient ways to do this.  If you have a slow connection a piecemeal approach will probably be required.  It took about 20 minutes to get all these files over a decent Verizon MIFI unit connection (I, unfortunately, don't have speed metrics available).

Notes about the downloaded product:  the US State Department's language tapes and lessons were mostly written and produced 30 to 50 years ago.  It's not Rosetta Stone, but I have found them to have value when it comes to practicing pronunciation, including cadence and rhythm of the foreign language - things you just can't get from printed or displayed text.

My late wife gifted me some Spanish tapes prior to the internet age that helped me out.  I am by no means fluent in Spanish, but I can say Hacemos lo que podemos hasta que nos boten (this may not be entirely grammatically correct) to the Spanish speaking mining engineers and get a laugh.

The original names of the mp3's are unnecessarily long and have the appearance of having been created by the Department of Redundancy Department.  It's a government thing, but it does not reflect on the quality of the product.  While the tapes at times are socialogically and technologically dated in their subject matter, the foreign languages haven't changed all that much.

The script:  I used Python 3.4 with the urllib module's request method.  The main challenge was getting the url's of the mp3's right.  The names are not entirely consistent.  For help with this (I am using Firefox 24.3.0 on OpenBSD 5.4), I right clicked on the mp3's link and selected Inspect Element from the drop down menu:

The lower left window has the href and the link to the mp3 - if your script is not able to find the file, this is a convenient place to look.

This is the whole thing:


from urllib import request

# For getting foreign language study mp3's.
# Main part of URL for French.
MIDDLEURLI = 'French/Basic (Revised)/Volume {volume}/'
MIDDLEURLII = 'French/Basic (Revised)/Volume {0:s}/'
BASEURLEND = 'FSI - French Basic Course (Revised) '

# Format changes inexplicably at chapter 19.
# Grrrr . . .
URLI += '- Volume {volume} - Unit {unit:0>2d} '
URLI += '{unit:0>2d}.{section:0>2d}.mp3'

URLII += '- Volume {1[volume]:d} - Unit {1[unit]:0>2d} '
URLII += '{1[unit]:0>2d}.{1[section]:d}.mp3'

# Format for actual name of mp3 files.
# This is what I wanted for a name - your
# preferences may be different - adjust
# accordingly.
FILENAME = '{unit:0>2d}{section:0>2d}.mp3'

# Texts (pdf format).
# Everything the State Dept. does is a 'StudentText' -
# fair enough.
STUDENTTXT = 'StudentText.pdf'

PDFURLBASICTEXT1 += 'Fsi-FrenchBasicCourserevised-StudentText/'
PDFURLBASICTEXT1 += 'Fsi-FrenchBasicCourserevised-Volume1-'

PDFURLBASICTEXT2 += 'Fsi-FrenchBasicCourserevised-StudentText/'
PDFURLBASICTEXT2 += 'Fsi-FrenchBasicCourserevised-Volume2-'

PDFURLMONDEFR += 'Fsi-LeMondeFrancophone/Fsi-LeMondeFrancophone-'

TWO = 'Two'

# Tack on StudentText.pdf to end.
pdfs = [pdfx + STUDENTTXT for pdfx in pdfs]
myfilenames = ['basictext1.pdf', 'basictext2.pdf', 'mondefrancophone.pdf']
# I'm using the dictionary keys for filenames.
pdfs = dict(zip(myfilenames, pdfs))

VOLUME = 'volume'
UNIT = 'unit'
SECTION = 'section'

# volume key, then list of two tuples of unit and
# number of sections
VOLUMES = {1:[(1, 6), (2, 6), (3, 6), (4, 7), (5, 7),
              (6, 3), (7, 11), (8, 10), (9, 11), (10, 9),
              (11, 9), (12, 4)],
           2:[(13, 8), (14, 9), (15, 10), (16, 9), (17, 11),
              (18, 7), (19, 9), (20, 8), (21, 8), (22, 7),
              (23, 8), (24, 6)]}

mp3s = []
for key in VOLUMES:
    for unitsection in VOLUMES[key]:
        for x in range(1, unitsection[1] + 1):
            mp3s.append({VOLUME:key, UNIT:unitsection[0], SECTION:x})

for mp3x in mp3s:
    # Name format change at chapter 19 :-(
    if mp3x[UNIT] > 18:
        urlx = URLII.format(TWO, mp3x)
        urlx = URLI.format(**mp3x)
    filenamex = FILENAME.format(**mp3x)
    print('Retrieving {0} . . .'.format(urlx))
    request.urlretrieve(urlx, filenamex)

# Add pdf texts at end.
for pdfx in pdfs:
    print('Retrieving {0} . . .'.format(pdfx))
    request.urlretrieve(pdfs[pdfx], pdfx)

print('Everything appears to have downloaded.')
print('Check the directory with the files to be sure.')
As for my French efforts, I've had better luck downloading this stuff than I have learning it.  Nonetheless, a quick message to Guido van Rossum and the other core devs:  transmettez-leur mon meilleur souvenir.