tag:blogger.com,1999:blog-5242304296737655092024-03-05T13:23:11.606-08:00pyrightPython programming language related posts.Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.comBlogger84125tag:blogger.com,1999:blog-524230429673765509.post-71896630343665709082021-08-14T09:37:00.001-07:002021-08-14T09:39:59.289-07:00Embedding an Image in an Outlook Email<p> I had a project where I needed to generate some draft emails programmatically in Outlook.<br /><br />Inserting the company logo and some content related images took some googling to sort through. Ideally I wanted to encode the images as Base64 strings, but Outlook does not allow this.<br /><br />The code below has some html I took from an existing email. Interpolating strings into html and, worse, hand editing it, is probably not best practice, but for purposes of this demo, it works. Also, there may be more abstracted tools and libraries for working with Outlook. I'm used to using win32com, so that is my general go-to tool for Microsoft Office and other historically significant desktop Windows apps.<br /><br />Screenshot of draft email that script generates:</p><div class="separator" style="clear: both; text-align: left;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXK63TnDdm3nciHnV5cwZMC60-sLg5EXjKdEUrMUA1QFGT-Ld20dysH68lukvd5VcpcdB29GhuwCjFabir4LoQ7Fq_9sYHNQyIAfAoYNRIAS8GnBD1zWYd3RCJ8EnwZh8g900hyphenhyphenz8XEK0/s976/draftemail.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="976" data-original-width="514" height="761" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXK63TnDdm3nciHnV5cwZMC60-sLg5EXjKdEUrMUA1QFGT-Ld20dysH68lukvd5VcpcdB29GhuwCjFabir4LoQ7Fq_9sYHNQyIAfAoYNRIAS8GnBD1zWYd3RCJ8EnwZh8g900hyphenhyphenz8XEK0/w402-h761/draftemail.PNG" width="402" /></a></div><br /><p>Code:<br /><br /></p><p><span style="font-family: courier;"><b>"""</b></span></p><p><span style="font-family: courier;"><b>Demo of how to embed a picture in a Microsoft Outlook email.</b></span></p><p><span style="font-family: courier;"><b>"""</b></span></p><p><span style="font-family: courier;"><b><br /></b></span></p><p><span style="font-family: courier;"><b>import win32com.client as win32</b></span></p><p><span style="font-family: courier;"><b><br /></b></span></p><p><span style="font-family: courier;"><b>PR_ATTACH_CONTENT_ID = 'http://schemas.microsoft.com/mapi/proptag/0x3712001F'</b></span></p><p><span style="font-family: courier;"><b>PR_ATTACHMENT_HIDDEN = 'http://schemas.microsoft.com/mapi/proptag/0x7FFE000B'</b></span></p><p><span style="font-family: courier;"><b><br /></b></span></p><p><span style="font-family: courier;"><b>PICLOC = r'C:\Users\carl.trachte\Documents\paintbrush.png'</b></span></p><p><span style="font-family: courier;"><b><br /></b></span></p><p><span style="font-family: courier;"><b>BODYFORMAT = """</b></span></p><p><span style="font-family: courier;"><b><html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"></b></span></p><p><span style="font-family: courier;"><b> <head></b></span></p><p><span style="font-family: courier;"><b> <meta http-equiv=Content-Type content="text/html; charset=us-ascii"></b></span></p><p><span style="font-family: courier;"><b> <meta name=Generator content="Microsoft Word 15 (filtered medium)"></b></span></p><p><span style="font-family: courier;"><b> <!--[if !mso]></b></span></p><p><span style="font-family: courier;"><b> <style>v\:* {{behavior:url(#default#VML);}}</b></span></p><p><span style="font-family: courier;"><b> o\:* {{behavior:url(#default#VML);}}</b></span></p><p><span style="font-family: courier;"><b> w\:* {{behavior:url(#default#VML);}}</b></span></p><p><span style="font-family: courier;"><b> .shape {{behavior:url(#default#VML);}}</b></span></p><p><span style="font-family: courier;"><b> </style></b></span></p><p><span style="font-family: courier;"><b> <![endif]--></b></span></p><p><span style="font-family: courier;"><b> <style></b></span></p><p><span style="font-family: courier;"><b> <!--</b></span></p><p><span style="font-family: courier;"><b> /* Font Definitions */</b></span></p><p><span style="font-family: courier;"><b> @font-face</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>{{font-family:"Cambria Math";</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>panose-1:2 4 5 3 5 4 6 3 2 4;}}</b></span></p><p><span style="font-family: courier;"><b> @font-face</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>{{font-family:Calibri;</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>panose-1:2 15 5 2 2 2 4 3 2 4;}}</b></span></p><p><span style="font-family: courier;"><b> /* Style Definitions */</b></span></p><p><span style="font-family: courier;"><b> p.MsoNormal, li.MsoNormal, div.MsoNormal</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>{{margin:0in;</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>font-size:11.0pt;</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>font-family:"Calibri",sans-serif;}}</b></span></p><p><span style="font-family: courier;"><b> span.EmailStyle17</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>{{mso-style-type:personal-compose;</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>font-family:"Calibri",sans-serif;</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>color:windowtext;}}</b></span></p><p><span style="font-family: courier;"><b> .MsoChpDefault</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>{{mso-style-type:export-only;</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>font-family:"Calibri",sans-serif;}}</b></span></p><p><span style="font-family: courier;"><b> @page WordSection1</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>{{size:8.5in 11.0in;</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>margin:1.0in 1.0in 1.0in 1.0in;}}</b></span></p><p><span style="font-family: courier;"><b> div.WordSection1</b></span></p><p><span style="font-family: courier;"><b> <span style="white-space: pre;"> </span>{{page:WordSection1;}}</b></span></p><p><span style="font-family: courier;"><b> --></b></span></p><p><span style="font-family: courier;"><b> </style></b></span></p><p><span style="font-family: courier;"><b> <!--[if gte mso 9]></b></span></p><p><span style="font-family: courier;"><b> <xml></b></span></p><p><span style="font-family: courier;"><b> <o:shapedefaults v:ext="edit" spidmax="1026" /></b></span></p><p><span style="font-family: courier;"><b> </xml></b></span></p><p><span style="font-family: courier;"><b> <![endif]--><!--[if gte mso 9]></b></span></p><p><span style="font-family: courier;"><b> <xml></b></span></p><p><span style="font-family: courier;"><b> <o:shapelayout v:ext="edit"></b></span></p><p><span style="font-family: courier;"><b> <o:idmap v:ext="edit" data="1" /></b></span></p><p><span style="font-family: courier;"><b> </o:shapelayout></b></span></p><p><span style="font-family: courier;"><b> </xml></b></span></p><p><span style="font-family: courier;"><b> <![endif]--></b></span></p><p><span style="font-family: courier;"><b> </head></b></span></p><p><span style="font-family: courier;"><b> <body lang=EN-US link="#0563C1" vlink="#954F72" style='word-wrap:break-word'></b></span></p><p><span style="font-family: courier;"><b> <div class=WordSection1></b></span></p><p><span style="font-family: courier;"><b> <table class=MsoNormalTable border=0 cellspacing=0 cellpadding=0 style='margin-left:-1.5pt;border-collapse:collapse'></b></span></p><p><span style="font-family: courier;"><b> <tr style='height:14.5pt'></b></span></p><p><span style="font-family: courier;"><b> </tr></b></span></p><p><span style="font-family: courier;"><b> </table></b></span></p><p><span style="font-family: courier;"><b> {0:s}</b></span></p><p><span style="font-family: courier;"><b> <p class=MsoNormal></b></span></p><p><span style="font-family: courier;"><b> <o:p>&nbsp;</o:p></b></span></p><p><span style="font-family: courier;"><b> </p></b></span></p><p><span style="font-family: courier;"><b> <p class=MsoNormal></b></span></p><p><span style="font-family: courier;"><b> <o:p>&nbsp;</o:p></b></span></p><p><span style="font-family: courier;"><b> </p></b></span></p><p><span style="font-family: courier;"><b> </div></b></span></p><p><span style="font-family: courier;"><b> </div></b></span></p><p><span style="font-family: courier;"><b> </body></b></span></p><p><span style="font-family: courier;"><b></html>"""</b></span></p><p><span style="font-family: courier;"><b><br /></b></span></p><p><span style="font-family: courier;"><b>GRAPHICFRAME = """</b></span></p><p><span style="font-family: courier;"><b> <div class=WordSection1></b></span></p><p><span style="font-family: courier;"><b> <p class=MsoNormal></b></span></p><p><span style="font-family: courier;"><b> <o:p>&nbsp;</o:p></b></span></p><p><span style="font-family: courier;"><b> </p></b></span></p><p><span style="font-family: courier;"><b> <p class=MsoNormal></b></span></p><p><span style="font-family: courier;"><b> <o:p>&nbsp;</o:p></b></span></p><p><span style="font-family: courier;"><b> </p></b></span></p><p><span style="font-family: courier;"><b> <p class=MsoNormal></b></span></p><p><span style="font-family: courier;"><b> <b></b></span></p><p><span style="font-family: courier;"><b> <o:p>&nbsp;</o:p></b></span></p><p><span style="font-family: courier;"><b> </b></b></span></p><p><span style="font-family: courier;"><b> </p></b></span></p><p><span style="font-family: courier;"><b> <p class=MsoNormal></b></span></p><p><span style="font-family: courier;"><b> <o:p>&nbsp;</o:p></b></span></p><p><span style="font-family: courier;"><b> </p></b></span></p><p><span style="font-family: courier;"><b> <p class=MsoNormal></b></span></p><p><span style="font-family: courier;"><b> <img width=410 height=410 style='width:4.2666in;height:4.2666in' id="Picture_x0020_2" src="cid:{0:s}" alt="Chart&#10;&#10;Description automatically generated"></b></span></p><p><span style="font-family: courier;"><b> <o:p></o:p></b></span></p><p><span style="font-family: courier;"><b> </p></b></span></p><p><span style="font-family: courier;"><b> <p class=MsoNormal></b></span></p><p><span style="font-family: courier;"><b> <o:p>&nbsp;</o:p></b></span></p><p><span style="font-family: courier;"><b> </p></b></span></p><p><span style="font-family: courier;"><b> <p class=MsoNormal></b></span></p><p><span style="font-family: courier;"><b> <o:p>&nbsp;</o:p></b></span></p><p><span style="font-family: courier;"><b> </p></b></span></p><p><span style="font-family: courier;"><b>"""</b></span></p><p><span style="font-family: courier;"><b><br /></b></span></p><p><span style="font-family: courier;"><b><br /></b></span></p><p><span style="font-family: courier;"><b>def getoutlook():</b></span></p><p><span style="font-family: courier;"><b> """</b></span></p><p><span style="font-family: courier;"><b> Return Outlook object.</b></span></p><p><span style="font-family: courier;"><b> """</b></span></p><p><span style="font-family: courier;"><b> return win32.gencache.EnsureDispatch('outlook.application')</b></span></p><p><span style="font-family: courier;"><b><br /></b></span></p><p><span style="font-family: courier;"><b>def makeemail(outlookobject, text, subject, recipient):</b></span></p><p><span style="font-family: courier;"><b> """</b></span></p><p><span style="font-family: courier;"><b> Return e-mail object</b></span></p><p><span style="font-family: courier;"><b> """</b></span></p><p><span style="font-family: courier;"><b> mail = outlookobject.CreateItem(0)</b></span></p><p><span style="font-family: courier;"><b> mail.To = recipient</b></span></p><p><span style="font-family: courier;"><b> mail.Subject = subject</b></span></p><p><span style="font-family: courier;"><b> mail.HTMLBody = text</b></span></p><p><span style="font-family: courier;"><b> return mail</b></span></p><p><span style="font-family: courier;"><b><br /></b></span></p><p><span style="font-family: courier;"><b>def addlogoshow(mailobject):</b></span></p><p><span style="font-family: courier;"><b> """</b></span></p><p><span style="font-family: courier;"><b> Embed cid image in e-mail.</b></span></p><p><span style="font-family: courier;"><b><br /></b></span></p><p><span style="font-family: courier;"><b> Save e-mail and bring up in window.</b></span></p><p><span style="font-family: courier;"><b> """</b></span></p><p><span style="font-family: courier;"><b> attachmnt = mailobject.Attachments.Add(PICLOC, win32.constants.olByValue, 0, 'paintbrush.png')</b></span></p><p><span style="font-family: courier;"><b> attachmnt.PropertyAccessor.SetProperty(PR_ATTACH_CONTENT_ID, 'paintbrush.png')</b></span></p><p><span style="font-family: courier;"><b> attachmnt.PropertyAccessor.SetProperty(PR_ATTACHMENT_HIDDEN, False)</b></span></p><p><span style="font-family: courier;"><b> mailobject.Save()</b></span></p><p><span style="font-family: courier;"><b> mailobject.Display()</b></span></p><p><span style="font-family: courier;"><b> mailobject.Save()</b></span></p><p><span style="font-family: courier;"><b>outlook = getoutlook()</b></span></p><p><span style="font-family: courier;"><b>htmlbody = BODYFORMAT.format(GRAPHICFRAME.format('paintbrush.png'))</b></span></p><p><b><span style="font-family: courier;">mail = makeemail(outlook, htmlbody, 'blah', 'XXXXXXXX@gmail.com')</span></b></p><p><span style="font-family: courier;"><b>addlogoshow(mail)</b></span></p><p><br /><br /></p>Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com0tag:blogger.com,1999:blog-524230429673765509.post-3909059697992088842017-12-09T18:18:00.000-08:002018-03-02T15:17:29.885-08:00Powershell Encoded Command, sqlcmd, and csv Query OutputA while back I did a <a href="http://pyright.blogspot.com/2015/09/mssql-sqlcmd-bcp-csv-dump-excel.html" target="_blank">post on using sqlcmd and dumping data to Excel</a>. At the time I was using Microsoft SQL Server's bcp (bulk copy) utility to dump data to a csv file.<br />
<br />
Use of bcp is blocked where I am working now. But Powershell and sqlcmd are very much available on the Windows workstations we use. Just as with bcp, smithing text for sqlcmd input can be a little tricky, same with Powershell. But Powershell has an <b><span style="font-family: "courier new" , "courier" , monospace;">EncodedCommand</span></b> feature which allows you to feed input to it as a base 64 string. This will be a quick demo of the use of this feature and output of a faux comma delimited (csv) file with data.<br />
<br />
<b>Disclaimer:</b> s<i>cripts that rely extensively on </i><span style="font-family: "courier new" , "courier" , monospace;"><b>os.system()</b></span> <i>calls from Python are indeed hacky and mousetrappy.</i> I think the saying goes "Necessity is a mother," or something similar. Onward.<br />
<br />
<b>Getting the base 64 string from the original string:</b><br />
<div>
<b><br /></b></div>
<div>
First our SQL code that queries a mock table I made in my mock database:</div>
<div>
<br /></div>
<div>
<span style="font-family: "courier new";"><b>USE test;</b></span></div>
<div>
<br /></div>
<div>
<b><span style="font-family: "courier new" , "courier" , monospace;">SELECT testpk,<br /> namex,<br /> [value]<br />FROM testtable<br />ORDER BY testpk;</span></b></div>
<div>
<b><span style="font-family: "courier new";"><br /></span></b></div>
<div>
We will call this file <span style="font-family: "courier new" , "courier" , monospace;"><b>selectdata.sql</b></span><span style="font-family: "times" , "times new roman" , serif;">.</span></div>
<div>
<span style="font-family: "times";"><br /></span></div>
<div>
<span style="font-family: "times";">Then the call to sqlcmd/Powershell:</span></div>
<div>
<span style="font-family: "times";"><br /></span></div>
<div>
<b><span style="font-family: "courier new" , "courier" , monospace;">sqlcmd -S localhost -i .\selectdata.sql -E -h -1 -s "," -W | Tee-Object -FilePath .\testoutput</span></b></div>
<div>
<b><span style="font-family: "times" , "times new roman" , serif;"><br /></span></b></div>
<div>
<span style="font-family: "times";">In Python (we have to use Python 2.7 in our environment, so this is Python 2.x specific):</span></div>
<div>
<span style="font-family: "times";"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><b>Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win32<br />Type "help", "copyright", "credits" or "license" for more information.<br />>>> import base64<br />>>> stringx = r'sqlcmd -S localhost -i .\selectdata.sql -E -h -1 -s "," -W | Tee-Object -FilePath .\testoutput'<br />>>> bytesx = stringx.encode('utf-16-le')<br />>>> encodedcommandx = base64.b64encode(bytesx)<br />>>> encodedcommandx<br />'cwBxAGwAYwBtAGQAIAAtAFMAIABsAG8AYwBhAGwAaABvAHMAdAAgAC0AaQAgAC4AXABzAGUAbABlAGMAdABkAGEAdABhAC4AcwBxAGwAIAAtAEUAIAAtAGgAIAAtADEAIAAtAHMAIAAiACwAIgAgAC0AVwAgAHwAIABUAGUAZQAtAE8AYgBqAGUAYwB0ACAALQBGAGkAbABlAFAAYQB0AGgAIAAuAFwAdABlAHMAdABvAHUAdABwAHUAdAA='<br />>>></b></span></div>
<span style="font-family: "courier new" , "courier" , monospace;"></span><br />
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><span style="font-family: "times" , "times new roman" , serif;"><b><br /></b></span></span></div>
<span style="font-family: "courier new" , "courier" , monospace;">
</span>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><span style="font-family: "times";">I had to type out my command in the Python interpreter. When I pasted it in from GVim, it choked on the UTF encoding.</span></span></div>
<span style="font-family: "courier new" , "courier" , monospace;">
<div>
<span style="font-family: "times";"><br /></span></div>
<div>
<span style="font-family: "times";">Now, Powershell:</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><b>PS C:\Users\ctrachte> $sqlcmdstring = 'sqlcmd -S localhost -i .\selectdata.sql -E -h -1 -s "," -W | Tee-Object -FilePath<br /> .\testoutput'<br />PS C:\Users\ctrachte> $encodedcommand = [Convert]::ToBase64String([Text.Encoding]::Unicode.GetBytes($sqlcmdstring))<br />PS C:\Users\ctrachte> $encodedcommand<br />cwBxAGwAYwBtAGQAIAAtAFMAIABsAG8AYwBhAGwAaABvAHMAdAAgAC0AaQAgAC4AXABzAGUAbABlAGMAdABkAGEAdABhAC4AcwBxAGwAIAAtAEUAIAAtAGgAIAAtADEAIAAtAHMAIAAiACwAIgAgAC0AVwAgAHwAIABUAGUAZQAtAE8AYgBqAGUAYwB0ACAALQBGAGkAbABlAFAAYQB0AGgAIAAuAFwAdABlAHMAdABvAHUAdABwAHUAdAA=<br />PS C:\Users\ctrachte></b></span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div>
<span style="font-family: "times";">OK, the two base 64 strings are the same, so we are good.</span></div>
<div>
<span style="font-family: "times";"><br /></span></div>
<div>
<span style="font-family: "times";"><b>Command Execution from <span style="font-family: "courier new" , "courier" , monospace;">os.system()</span> call:</b></span></div>
<div>
<span style="font-family: "times";"><b></b></span><b><span style="font-family: "times";"><br /></span></b></div>
<div>
<b><span style="font-family: "courier new" , "courier" , monospace;">>>> import os</span></b><br />
<b>>>> INVOKEPOWERSHELL = 'Powershell -EncodedCommand {0:s}'</b></div>
<div>
<b>>>> os.system(INVOKEPOWERSHELL.format(encodedcommandx))<br />Changed database context to 'test'.<br />000001,VOLUME,11.0<br />000002,YEAR,1999.0</b></div>
<div>
<b>(2 rows affected)<br />0<br />>>></b></div>
<div>
<b><span style="font-family: "times" , "times new roman" , serif;"><br /></span></b></div>
<div>
<span style="font-family: "times";">And, thanks to Powershell's version of UNIX-like system's tee command, we have a faux csv file as well as output to the command line.</span></div>
<div>
<span style="font-family: "times";"><br /></span></div>
<div>
<span style="font-family: "times";">Stackoverflow gave me much of what I needed to know for this:</span></div>
<span style="font-family: "times";"><div>
<br /></div>
<div>
Links:</div>
<div>
<br /></div>
<div>
Powershell's encoded command:</div>
<div>
<br /></div>
<div>
<b><span style="font-family: "courier new" , "courier" , monospace;">https://blogs.technet.microsoft.com/heyscriptingguy/2015/10/27/powertip-encode-string-and-execute-with-powershell/</span></b></div>
<div>
<b><span style="font-family: "times" , "times new roman" , serif;"><br /></span></b></div>
<div>
sqlcmd's output to faux csv:</div>
<div>
<br /></div>
<div>
<b><span style="font-family: "courier new" , "courier" , monospace;">https://stackoverflow.com/questions/425379/how-to-export-data-as-csv-format-from-sql-server-using-sqlcmd</span></b></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div>
The UTF encoding stuff just took some trial and error and fiddling.</div>
<div>
<br /></div>
<div>
Thanks for stopping by.</div>
</span><div>
<b><br /></b></div>
<div>
<b><br /></b></div>
<span style="font-family: "times";"></span><div>
<b></b><br /></div>
</span><span style="font-family: "courier new" , "courier" , monospace;"><b></b></span><br />
<div>
<b><br /></b></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com0tag:blogger.com,1999:blog-524230429673765509.post-18737484706384791902017-02-19T16:34:00.000-08:002017-02-19T16:38:40.940-08:00Filling in Missing Grouping Columns of MSSQL SSRS Report Dumped to Excel<div class="separator" style="clear: both; text-align: center;">
</div>
This is another simple but common problem in certain business environments:<br />
<br />
1) Data are presented via a Microsoft SQL Server Reporting Services report, BUT<br />
<br />
2) The user wants the data in Excel, and, further, wants to play with it (pivot, etc.) there. The problem is that the grouping column labels are not in every record, only in the one row that begins the list of records for that group (sanitized screenshot below):<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJMGbmZPTkmD4En9U90HgXLdad2lmvAg755qvZ-jFx9eKhyphenhyphen9wcefARUyvM_MNg1gIcL12mxTE5UqohF-NLM9ELFOqNwt_iAxpM9uYV6HQh2EH7XcVATMTBKAGK-oM60DuEdGUvn4oj-Kc/s1600/SSRSDumpsanitized2.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="419" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJMGbmZPTkmD4En9U90HgXLdad2lmvAg755qvZ-jFx9eKhyphenhyphen9wcefARUyvM_MNg1gIcL12mxTE5UqohF-NLM9ELFOqNwt_iAxpM9uYV6HQh2EH7XcVATMTBKAGK-oM60DuEdGUvn4oj-Kc/s640/SSRSDumpsanitized2.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">But I don't WANT to copy and paste all those groupings for 30,000 records :*-(</td></tr>
</tbody></table>
I had this assignment recently from a remote request. It took about four rounds of an e-mail exchange to figure out that it really wasn't a data problem, but a formatting one that needed solving.<br />
<br />
It is possible to do the whole thing in Python. I did the Excel part by hand in order to get a handle on the data:<br />
<br />
1) In Excel, delete the extra rows on top of the report leaving just the headers and the data.<br />
<br />
2) In Excel, select everything on the data page, format the cells correctly by unselecting the Merge Cells and Wraparound options.<br />
<br />
3) In Excel, at this point you should be able to see if there are extra empty columns as space fillers; delete them. Save the worksheet as a csv file.<br />
<br />
4) In a text editor, open your csv file, identify any empty rows, and delete them. Change column header names as desired.<br />
<br />
Now the Python part:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">#!python36<br /><br />"""<br />Doctor csv dump from unmerged cell<br />dump of SSRS dump from MSSQL database.<br /><br />Fill in cell gaps where merged<br />cells had only one grouping value<br />so that all rows are complete records.<br />"""<br /><br />import pprint<br /><br />COMMA = ','<br />EMPTY = ''<br /><br />INFILE = 'rawdata.csv'<br />OUTFILE = 'canneddumpfixed.csv'<br /><br />ERRORFLAG = 'ERROR!' <br /><br />f = open(INFILE, 'r')<br />headerline = next(f)<br />numbercolumns = len(headerline.split(COMMA))<br /><br />f2 = open(OUTFILE, 'w')<br /><br /># Assume at least one data column on far right.<br />missingvalues = (numbercolumns - 1) * [ERRORFLAG]<br /><br />for linex in f:<br /> print('Processing line {:s} . . .'.format(linex))<br /> splitrecord = linex.split(COMMA)<br /> for slotx in range(0, numbercolumns - 1):<br /> if splitrecord[slotx] != EMPTY:<br /> missingvalues[slotx] = splitrecord[slotx]<br /> else:<br /> splitrecord[slotx] = missingvalues[slotx]<br /> f2.write(COMMA.join(splitrecord))<br /><br />f2.close()<br /><br />print('Finished')</span><br />
<br />
<span style="font-family: inherit;">At this point you've got your data in csv format - you can open it in Excel and go to work.</span><br />
<br />
<span style="font-family: inherit;">There may be a free or COTS (commercial off the shelf) utility that does <span style="font-family: inherit;">all this somewhere in the Microsoft <span style="font-family: inherit;">"ecosys<span style="font-family: inherit;">tem" (I think that's their fancy envi<span style="font-family: inherit;">ro-friendly word for <span style="font-family: inherit;">vendor-user community) but I don't know of one.</span></span></span></span></span></span><br />
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"></span></span></span></span></span></span><br />
<br />
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;">Thanks for stopping by.</span></span></span></span></span> </span><br />
<span style="font-family: inherit;"></span><br />
<span style="font-family: inherit;"></span><br />
<span style="font-family: inherit;"></span><br />
<span style="font-family: inherit;"></span><br />
<span style="font-family: inherit;"><br /></span>Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com3tag:blogger.com,1999:blog-524230429673765509.post-36464894293372070002017-02-15T19:40:00.000-08:002017-02-15T19:40:31.421-08:00Crude Testing of Equivalent Code With assertIn engineering and business environments, it is common to have to<br />
<br />
1) recreate an equivalent calculation in a different format for a different purpose and check the results against the original calculation.<br />
<br />
<br />
2) shepherd a calculation process from one vendor system through a transition to another (an upgrade, for example) by hacking a set of provisional scripts together.<br />
<br />
<br />
3) implement a bunch of linear regressions in calculations. If I recall correctly, there has been a linear regression functionality in Excel for ages (since the early 90's?); it is the tried and (maybe) true tool of data fitters/forcers everywhere. Conceivably you could accurately, if not precisely, model just about any curve with enough linear segments. Mercifully, the ones I show below have only two segments per data set.<br />
<br />
This problem embodies all three bullets above. I've sanitized the code which makes it a little ridiculous, but no less voluminous (sorry).<br />
<br />
Here's what we have in the vendor's system - it is Python (2.7) code, but it's run inside special a la carte purchased software that my department doesn't have. Also, it's full of a bunch of constants that I'm not really comfortable recognizing or maintaining:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">"""<br />Cut and pasted formulas from vendor<br />specific GUI/Python API.<br />"""<br /><br /># LOC1<br />def loc1fromvendor(CONTROL1,<br /> CONTROL2,<br /> x):<br /> """<br /> Loc1 y calculation from vendor.<br /><br /> CONTROL1 is the primary code (integer<br /> or round digit float).<br /> CONTROL2 is the secondary code (integer<br /> or round digit float).<br /> x is the x-axis input. (float).<br /><br /> Returns float.<br /> """<br /> DEFAULTY = 2.50<br /> <br /> if CONTROL1 == 9:<br /> if CONTROL2 == 1:<br /> if x > 1.275:<br /> Y = (-0.0003 * x) + 6.4781<br /> else:<br /> Y = 2.53<br /> else:<br /> Y = 2.54<br /> elif CONTROL1 == 8:<br /> Y = 2.6<br /> elif CONTROL1 == 7:<br /> if CONTROL2 == 1:<br /> if x > 1.315:<br /> Y = -0.003 * x + 6.548<br /> else:<br /> Y = 2.6<br /> else:<br /> Y = -0.0031 * x + 2.958<br /> elif CONTROL1 == 6:<br /> if CONTROL2 == 1:<br /> if x >1.310:<br /> Y = -0.0018 * x + 4.9307<br /> else:<br /> Y = 2.57<br /> else:<br /> Y = -0.0004 * x + 3.0612<br /> elif CONTROL1 == 5:<br /> if CONTROL2 == 1:<br /> if x >1.250:<br /> Y = -0.0026 * x + 5.7152<br /> else:<br /> Y = 2.47<br /> else:<br /> Y = -0.0003 * x + 2.8733<br /> elif CONTROL1 == 4:<br /> if CONTROL2 == 1:<br /> if x >1.290:<br /> Y = -0.0032 * x + 6.7257<br /> else:<br /> Y = 2.6<br /> else:<br /> Y = -0.0002 * x + 2.8215<br /> elif CONTROL1 == 1:<br /> if CONTROL2 == 1:<br /> Y = 2.35<br /> else:<br /> Y = 2.45<br /> else:<br /> Y = DEFAULTY<br /> return Y<br /><br /># LOC2<br />def loc2fromvendor(CONTROL1,<br /> CONTROL2,<br /> x):<br /> """<br /> Loc2 y calculation from vendor.<br /><br /> CONTROL1 is the primary code (integer<br /> or round digit float).<br /> CONTROL2 is the secondary code (integer<br /> or round digit float).<br /> x is the x-axis input. (float).<br /><br /> Returns float.<br /> """<br /> DEFAULTY = 2.50<br /> <br /> if CONTROL1 == 9:<br /> if CONTROL2 == 1:<br /> Y = -0.0006 * x + 3.3121<br /> else:<br /> Y = -0.0006 * x + 3.3121<br /> elif CONTROL1 == 8:<br /> if CONTROL2 == 1:<br /> if x >1.050:<br /> Y = 2.65<br /> else:<br /> Y = 2.65<br /> else:<br /> if x >1.050:<br /> Y = 2.65<br /> else:<br /> Y = 2.65<br /> elif CONTROL1 == 7:<br /> if CONTROL2 == 1:<br /> if x > 1.050:<br /> Y = -0.0012 * x + 3.886<br /> else:<br /> Y = -0.0012 * x + 3.886<br /> else:<br /> if x > 1.050:<br /> Y = -0.00007 * x + 2.6787<br /> else:<br /> Y = -0.00007 * x + 2.6787<br /> elif CONTROL1 == 6:<br /> if CONTROL2 == 1:<br /> if x >1.050:<br /> Y = -0.001 * x + 3.731<br /> else:<br /> Y = -0.001 * x + 3.731<br /> else:<br /> if x >1.050:<br /> Y = -0.0012 * x + 4.0757<br /> else:<br /> Y = -0.0012 * x + 4.0757<br /> elif CONTROL1 == 5:<br /> if CONTROL2 == 1:<br /> if x >1.050:<br /> Y = 2.1<br /> else:<br /> Y = 2.1<br /> else:<br /> if x >1.050:<br /> Y = -0.0003 * x + 2.9564<br /> else:<br /> Y = -0.0003 * x + 2.9564<br /> elif CONTROL1 == 4:<br /> if CONTROL2 == 1:<br /> if x >1.050:<br /> Y = -0.000009 * x + 2.1972<br /> else:<br /> Y = -0.000009 *x + 2.1972<br /> else:<br /> if x >1.050:<br /> Y = -0.0005 * x + 3.2461<br /> else:<br /> Y = -0.0005 * x + 3.2461 <br /> elif CONTROL1 == 1:<br /> if CONTROL2 == 1:<br /> Y = -0.001 * x + 3.7257<br /> else:<br /> Y = -0.001 * x + 3.7257<br /> else:<br /> Y = DEFAULTY<br /> return Y<br /><br /># LOC3<br />def loc3fromvendor(CONTROL1,<br /> CONTROL2,<br /> x):<br /> """<br /> Loc3 y calculation from vendor.<br /><br /> CONTROL1 is the primary code (integer<br /> or round digit float).<br /> CONTROL2 is the secondary code (integer<br /> or round digit float).<br /> x is the x-axis input. (float).<br /><br /> Returns float.<br /> """<br /> DEFAULTY = 2.50<br /> <br /> if CONTROL1 == 9:<br /> Y = 2.49<br /> elif CONTROL1 == 8:<br /> if x > 1.000:<br /> Y = -0.0006 * x + 3.3291<br /> else:<br /> Y = 2.64<br /> elif CONTROL1 == 7:<br /> if x > 1.050:<br /> Y = -0.0009 * x + 3.5929<br /> else:<br /> Y = 2.67<br /> elif CONTROL1 == 6:<br /> if x > 1.080:<br /> Y = -0.0013 * x + 4.0665<br /> else:<br /> # Debug.<br /> # print 'x in vendor function = {:f}'.format(x)<br /> Y = 2.65<br /> elif CONTROL1 == 5:<br /> if x > 950:<br /> Y = -0.001 * x + 3.4996<br /> else:<br /> Y = 2.59<br /> elif CONTROL1 == 4:<br /> if x > 1.100:<br /> Y = -0.0018 * x + 4.6690<br /> else:<br /> Y = 2.68<br /> elif CONTROL1 == 1:<br /> if x > 1.000:<br /> Y = -0.0004 * x + 2.8857<br /> else:<br /> Y = 2.49<br /> else:<br /> Y = DEFAULTY<br /> return Y<br /><br /># LOC4<br />def loc4fromvendor(CONTROL1,<br /> CONTROL2,<br /> x):<br /> """<br /> Loc4 y calculation from vendor.<br /><br /> CONTROL1 is the primary code (integer<br /> or round digit float).<br /> CONTROL2 is the secondary code (integer<br /> or round digit float).<br /> x is the x-axis input. (float).<br /><br /> Returns float.<br /> """<br /> DEFAULTY = 2.50<br /> <br /> if CONTROL1 == 9:<br /> Y = -0.0000008 * x + 2.6761<br /> elif CONTROL1 == 8:<br /> Y = -0.000003 * x + 2.6975<br /> elif CONTROL1 == 7:<br /> if CONTROL2 == 1:<br /> if x > 1.000:<br /> Y = -0.0018 * x + 4.3902<br /> else:<br /> Y = 2.60<br /> else:<br /> Y = -0.00009 * x + 2.7334<br /> elif CONTROL1 == 6:<br /> if CONTROL2 == 1:<br /> if x > 1.100:<br /> Y = -0.0013 * x + 4.0322<br /> else:<br /> Y = 2.58<br /> else:<br /> Y = -0.0002 * x + 2.8081<br /> elif CONTROL1 == 5:<br /> if CONTROL2 == 1:<br /> Y = -0.0018 * x + 4.2758<br /> else:<br /> Y = -0.0001 * x + 2.6535<br /> elif CONTROL1 == 4:<br /> if CONTROL2 == 1:<br /> if x > 1.000:<br /> Y = -0.002 * x + 4.5548<br /> else:<br /> Y = 2.60<br /> else:<br /> if x > 1125:<br /> Y = -0.0011 * x + 3.9184<br /> else:<br /> Y = 2.65<br /> elif CONTROL1 == 1:<br /> Y = -0.0003 * x + 2.7802<br /> else:<br /> Y = DEFAULTY<br /> return Y</span><br />
<br />
<br />
My code is less multiple function based and more a single function with a bunch of lookup dictionaries rolled into one big dictionary. I'm not arguing my approach is necessarily better. For instance, I implemented my x variable ranges with lower bounds based on the precision of my data. This isn't very portable.<br />
<br />
The need to lock down my results to keep them in line with the original led me to use of the assert statement and the writing of a little walk of my dictionary against my function and the vendor's. This way, when I get a new "vendor function" (actually a snippet of code for a particular location or area) I can paste it into this crude ersatz test suite and see what needs changing.<br />
<br />
I caught a few missed decimal places, typos, transposed digits, and plain old omissions in my code using this approach. It is possible I've gone overboard with constants. I don't care. I have to read them and the only way I can keep them straight is by lining up the decimal places and locking them down as named constants (programmatically they are variables, but I'm not changing them).<br />
<br />
"But why don't they and why don't you use scientific notation?"<br />
<br />
As we used to say in the Navy years ago, "There is the right way, there is the wrong way, and the Navy way." Guess which one the vendor uses? Onward.<br />
<br />
Here's my code with the "test" of equivalency for the two approaches:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">"""<br />Attempt at generic script to process linear regressions<br />for multiple areas.<br />"""<br /><br />import sys<br /><br />import vendorformulas as vfx<br /><br /># Loc abbreviations.<br />LOC1 = 'loc1'<br />LOC2 = 'loc2'<br />LOC3 = 'loc3'<br />LOC4 = 'loc4'<br /><br />DEFAULTY = 2.50 <br /><br />CTL2ONE = 1<br />CTL2TWO = 2<br /><br />BIGX = 5000.0<br />LITTLEX = 0.0<br /><br />TYPE9 = 9<br />TYPE8 = 8<br />TYPE7 = 7<br />TYPE6 = 6<br />TYPE5 = 5<br />TYPE4 = 4<br />TYPE1 = 1<br /># Undefined control1 type for default for each loc.<br />UNDEF = 99<br /><br />slope = 'm'<br />b = 'b'<br /><br /># Compute y using formula (y = mx + b), control1, x, control2<br /># nested dictionaries<br /># control2<br /># x range<br /># m<br /># b<br /># Original logic gives unassigned CONTROL2 block to CTL2TWO interpretation<br /># Honor this in logic in program.<br /><br /># Slope values.<br />NOSLOPE = 0.0<br /><br />NEG0032000 = -0.0032000<br />NEG0031000 = -0.0031000<br />NEG0030000 = -0.0030000<br />NEG0026000 = -0.0026000<br />NEG0020000 = -0.0020000<br />NEG0018000 = -0.0018000<br />NEG0013000 = -0.0013000<br />NEG0012000 = -0.0012000<br />NEG0011000 = -0.0011000<br />NEG0010000 = -0.0010000<br />NEG0009000 = -0.0009000<br />NEG0006000 = -0.0006000<br />NEG0005000 = -0.0005000<br />NEG0004000 = -0.0004000<br />NEG0003000 = -0.0003000<br />NEG0002000 = -0.0002000<br />NEG0001000 = -0.0001000<br />NEG0000900 = -0.0000900<br />NEG0000700 = -0.0000700<br />NEG0000090 = -0.0000090<br />NEG0000030 = -0.0000030<br />NEG0000008 = -0.0000008<br /><br /># Intercept values.<br />T2PT1000 = 2.1000<br />T2PT1972 = 2.1972<br />T2PT3500 = 2.3500<br />T2PT4500 = 2.4500<br />T2PT4700 = 2.4700<br />T2PT4900 = 2.4900<br />T2PT5300 = 2.5300<br />T2PT5400 = 2.5400<br />T2PT5700 = 2.5700<br />T2PT5800 = 2.5800<br />T2PT5900 = 2.5900<br />T2PT6000 = 2.6000<br />T2PT6400 = 2.6400<br />T2PT6500 = 2.6500<br />T2PT6535 = 2.6535<br />T2PT6700 = 2.6700<br />T2PT6761 = 2.6761<br />T2PT6787 = 2.6787<br />T2PT6800 = 2.6800<br />T2PT6975 = 2.6975<br />T2PT7334 = 2.7334<br />T2PT7802 = 2.7802<br />T2PT8081 = 2.8081<br />T2PT8215 = 2.8215<br />T2PT8733 = 2.8733<br />T2PT8857 = 2.8857<br />T2PT9564 = 2.9564<br />T2PT9580 = 2.9580<br />T3PT0612 = 3.0612<br />T3PT2461 = 3.2461<br />T3PT3121 = 3.3121<br />T3PT3291 = 3.3291<br />T3PT4996 = 3.4996<br />T3PT5929 = 3.5929<br />T3PT7257 = 3.7257<br />T3PT7310 = 3.7310<br />T3PT8860 = 3.8860<br />T3PT9184 = 3.9184<br />F4PT0322 = 4.0322<br />F4PT0665 = 4.0665<br />F4PT0757 = 4.0757<br />F4PT2758 = 4.2758<br />F4PT3902 = 4.3902<br />F4PT5548 = 4.5548<br />F4PT6690 = 4.6690<br />F4PT9307 = 4.9307<br />F5PT7152 = 5.7152<br />S6PT4781 = 6.4781<br />S6PT5480 = 6.5480<br />S6PT7257 = 6.7257<br /><br />LOC1YS = {TYPE9:<br /> {CTL2ONE:<br /> {(LITTLEX, 1.27500):<br /> {slope:NOSLOPE, b:T2PT5300},<br /> (1.27501, BIGX):<br /> {slope:NEG0003000, b:S6PT4781}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:T2PT5400}}},<br /> TYPE8:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:T2PT6000}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:T2PT6000}}},<br /> TYPE7:<br /> {CTL2ONE:<br /> {(LITTLEX, 1.31500):<br /> {slope:NOSLOPE, b:T2PT6000},<br /> (1.31501, BIGX):<br /> {slope:NEG0030000, b:S6PT5480}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0031000, b:T2PT9580}}},<br /> TYPE6:<br /> {CTL2ONE:<br /> {(LITTLEX, 1.31000):<br /> {slope:NOSLOPE, b:T2PT5700},<br /> (1.31001, BIGX):<br /> {slope:NEG0018000, b:F4PT9307}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0004000, b:T3PT0612}}},<br /> TYPE5:<br /> {CTL2ONE:<br /> {(LITTLEX, 1.25000):<br /> {slope:NOSLOPE, b:T2PT4700},<br /> (1.25001, BIGX):<br /> {slope:NEG0026000, b:F5PT7152}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0003000, b:T2PT8733}}},<br /> TYPE4:<br /> {CTL2ONE:<br /> {(LITTLEX, 1.29000):<br /> {slope:NOSLOPE, b:T2PT6000},<br /> (1.29001, BIGX):<br /> {slope:NEG0032000, b:S6PT7257}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0002000, b:T2PT8215}}},<br /> TYPE1:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:T2PT3500}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:T2PT4500}}},<br /> UNDEF:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:DEFAULTY}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:DEFAULTY}}}}<br /># END LOC1<br /><br /># LOC2<br />LOC2YS = {TYPE9:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0006000, b:T3PT3121}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0006000, b:T3PT3121}}},<br /> TYPE8:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:T2PT6500}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:T2PT6500}}},<br /> TYPE7:{CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0012000, b:T3PT8860}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0000700, b:T2PT6787}}},<br /> TYPE6:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0010000, b:T3PT7310}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0012000, b:F4PT0757}}},<br /> TYPE5:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:T2PT1000}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0003000, b:T2PT9564}}},<br /> TYPE4:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0000090, b:T2PT1972}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0005000, b:T3PT2461}}},<br /> TYPE1:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0010000, b:T3PT7257}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0010000, b:T3PT7257}}},<br /> UNDEF:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:DEFAULTY}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:DEFAULTY}}}}<br /># END LOC2<br /><br />LOC3YS = {TYPE9:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:T2PT4900}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:T2PT4900}}},<br /> TYPE8:{CTL2ONE:<br /> {(LITTLEX, 1.00000):<br /> {slope:NOSLOPE, b:T2PT6400},<br /> (1.00001, BIGX):<br /> {slope:NEG0006000, b:T3PT3291}},<br /> CTL2TWO:<br /> {(LITTLEX, 1.00000):<br /> {slope:NOSLOPE, b:T2PT6400},<br /> (1.00001, BIGX):<br /> {slope:NEG0006000, b:T3PT3291}}},<br /> TYPE7:{CTL2ONE:<br /> {(LITTLEX, 1.05000):<br /> {slope:NOSLOPE, b:T2PT6700},<br /> (1.05001, BIGX):<br /> {slope:NEG0009000, b:T3PT5929}},<br /> CTL2TWO:<br /> {(LITTLEX, 1.05000):<br /> {slope:NOSLOPE, b:T2PT6700},<br /> (1.050001, BIGX):<br /> {slope:NEG0009000, b:T3PT5929}}},<br /> TYPE6:<br /> {CTL2ONE:<br /> {(LITTLEX, 1.08000):<br /> {slope:NOSLOPE, b:T2PT6500},<br /> (1.08001, BIGX):<br /> {slope:NEG0013000, b:F4PT0665}},<br /> CTL2TWO:<br /> {(LITTLEX, 1.08000):<br /> {slope:NOSLOPE, b:T2PT6500},<br /> (1.08001, BIGX):<br /> {slope:NEG0013000, b:F4PT0665}}},<br /> TYPE5:<br /> {CTL2ONE:<br /> {(LITTLEX, 950.0):<br /> {slope:NOSLOPE, b:T2PT5900},<br /> (950.01, BIGX):<br /> {slope:NEG0010000, b:T3PT4996}},<br /> CTL2TWO:<br /> {(LITTLEX, 950.0):<br /> {slope:NOSLOPE, b:T2PT5900},<br /> (950.01, BIGX):<br /> {slope:NEG0010000, b:T3PT4996}}},<br /> TYPE4:<br /> {CTL2ONE:<br /> {(LITTLEX, 1.10000):<br /> {slope:NOSLOPE, b:T2PT6800},<br /> (1.10001, BIGX):<br /> {slope:NEG0018000, b:F4PT6690}},<br /> CTL2TWO:<br /> {(LITTLEX, 1.10000):<br /> {slope:NOSLOPE, b:T2PT6800},<br /> (1.10001, BIGX):<br /> {slope:NEG0018000, b:F4PT6690}}},<br /> TYPE1:<br /> {CTL2ONE:<br /> {(LITTLEX, 1.00000):<br /> {slope:NOSLOPE, b:T2PT4900},<br /> (1.00001, BIGX):<br /> {slope:NEG0004000, b:T2PT8857}},<br /> CTL2TWO:<br /> {(LITTLEX, 1.00000):<br /> {slope:NOSLOPE, b:T2PT4900},<br /> (1.00001, BIGX):<br /> {slope:NEG0004000, b:T2PT8857}}},<br /> UNDEF:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:DEFAULTY}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:DEFAULTY}}}}<br /># END LOC3<br /><br /># LOC4<br />LOC4YS = {TYPE9:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0000008, b:T2PT6761}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0000008, b:T2PT6761}}},<br /> TYPE8:{CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0000030, b:T2PT6975}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0000030, b:T2PT6975}}},<br /> TYPE7:<br /> {CTL2ONE:<br /> {(LITTLEX, 1.00000):<br /> {slope:NOSLOPE, b:T2PT6000},<br /> (1.00001, BIGX):<br /> {slope:NEG0018000, b:F4PT3902}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0000900, b:T2PT7334}}},<br /> TYPE6:<br /> {CTL2ONE:<br /> {(LITTLEX, 1.10000):<br /> {slope:NOSLOPE, b:T2PT5800},<br /> (1.10001, BIGX):<br /> {slope:NEG0013000, b:F4PT0322}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0002000, b:T2PT8081}}},<br /> TYPE5:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0018000, b:F4PT2758}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0001000, b:T2PT6535}}},<br /> TYPE4:<br /> {CTL2ONE:<br /> {(LITTLEX, 1.00000):<br /> {slope:NOSLOPE, b:T2PT6000},<br /> (1.00001, BIGX):<br /> {slope:NEG0020000, b:F4PT5548}},<br /> CTL2TWO:<br /> {(LITTLEX, 1125.0):<br /> {slope:NOSLOPE, b:T2PT6500},<br /> (1125.01, BIGX):<br /> {slope:NEG0011000, b:T3PT9184}}},<br /> TYPE1:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0003000, b:T2PT7802}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NEG0003000, b:T2PT7802}}},<br /> UNDEF:<br /> {CTL2ONE:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:DEFAULTY}},<br /> CTL2TWO:<br /> {(LITTLEX, BIGX):<br /> {slope:NOSLOPE, b:DEFAULTY}}}}<br /># END LOC4<br /><br />YS = {LOC1:LOC1YS,<br /> LOC2:LOC2YS,<br /> LOC3:LOC3YS,<br /> LOC4:LOC4YS}<br /><br />VALIDCONTROL1 = [TYPE9, TYPE8, TYPE7, TYPE6, TYPE5, TYPE4, TYPE1]<br /><br />RETURNDEFAULTMSG = 'Returning default Y for {0:s}, {1:2.0f}, {2:2.0f}, {3:8.5f} . . .'<br />TESTINGMSG = 'Testing dictionary based y == function based y for {0:s}, {1:d}, {2:d}, {3:8.5f} . . .'<br />ASSERTIONERRORMSG = 'Assertion Error for {0:s}, {1:f}, {2:d}, {3:8.5f} . . .'<br /><br />def gety(loc, control1, x, control2):<br /> """<br /> y calculation for y = mx + b.<br /><br /> loc is the four letter loc abbreviation (loc1).<br /><br /> control1 is the integer CONTROL1 code.<br /><br /> x is a float for the x component of y = mx + b.<br /><br /> control2 is the integer CONTROL2 code.<br /> """<br /> # Compute y using formula (y = mx + b), control1, x, control2.<br /> # Match loc.<br /> ydictionary = YS[loc]<br /> # Check if control1 code belongs to recognized types.<br /> if control1 in VALIDCONTROL1:<br /> # Match control1.<br /> for control2x in ydictionary[control1]:<br /> # match control2.<br /> for xrangex in ydictionary[control1][control2]:<br /> # match x range.<br /> if (x >= xrangex[0] and<br /> x <= xrangex[1] and control2x == control2):<br /> mxb = ydictionary[control1][control2][xrangex]<br /> y = mxb[slope] * x + mxb[b]<br /> return y<br /> # Possible that control2 not defined;<br /> # Defaults to CONTROL2TWO.<br /> for xrangex in ydictionary[control1][CTL2TWO]:<br /> # match elevation range.<br /> if (x >= xrangex[0] and<br /> x <= xrangex[1]):<br /> mxb = ydictionary[control1][CTL2TWO][xrangex]<br /> y = mxb[slope] * x + mxb[b]<br /> return y<br /> # Doesn't matter if CTL2TWO or CTL2ONE or undefined<br /> # - default for loc will always be [UNDEF][CTL2TWO].<br /> print RETURNDEFAULTMSG.format(loc, control1, control2, x)<br /> return ydictionary[UNDEF][CTL2TWO][(LITTLEX, BIGX)][b]<br /><br /># TEST Calculations.<br />TESTFUNCS = {LOC1:vfx.loc1fromvendor,<br /> LOC2:vfx.loc2fromvendor,<br /> LOC3:vfx.loc3fromvendor,<br /> LOC4:vfx.loc4fromvendor}<br /><br />for locx in YS:<br /> for control1 in YS[locx]:<br /> for control2 in YS[locx][control1]:<br /> for xrangex in YS[locx][control1][control2]:<br /> for z in xrangex:<br /> dictionarybasedy = gety(locx, control1, z, control2)<br /> functionbasedy = TESTFUNCS[locx](control1, control2, z)<br /> print TESTINGMSG.format(locx, control1, control2, z)<br /> print 'dictionarybasedy = {0:8.7f}'.format(dictionarybasedy)<br /> print 'functionbasedy = {0:8.7f}'.format(functionbasedy)<br /> try:<br /> assert dictionarybasedy == functionbasedy<br /> except AssertionError:<br /> print ASSERTIONERRORMSG.format(locx, control1, control2, z)<br /> sys.exit()</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"></span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: inherit;">And the output:<br /><br /><br /><span style="font-family: "courier new" , "courier" , monospace;">Testing dictionary based y == function based y for loc2, 1, 1, 0.00000 . . .<br />dictionarybasedy = 3.7257000<br />functionbasedy = 3.7257000<br />Testing dictionary based y == function based y for loc2, 1, 1, 5000.00000 . . .<br />dictionarybasedy = -1.2743000<br />functionbasedy = -1.2743000<br />Testing dictionary based y == function based y for loc2, 1, 2, 0.00000 . . .<br />dictionarybasedy = 3.7257000<br />functionbasedy = 3.7257000<br />Testing dictionary based y == function based y for loc2, 1, 2, 5000.00000 . . .<br />dictionarybasedy = -1.2743000<br />functionbasedy = -1.2743000<br />Returning default Y for loc2, 99, 1, 0.00000 . . .<br />Testing dictionary based y == function based y for loc2, 99, 1, 0.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Returning default Y for loc2, 99, 1, 5000.00000 . . .<br />Testing dictionary based y == function based y for loc2, 99, 1, 5000.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Returning default Y for loc2, 99, 2, 0.00000 . . .<br />Testing dictionary based y == function based y for loc2, 99, 2, 0.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Returning default Y for loc2, 99, 2, 5000.00000 . . .<br />Testing dictionary based y == function based y for loc2, 99, 2, 5000.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Testing dictionary based y == function based y for loc2, 4, 1, 0.00000 . . .<br />dictionarybasedy = 2.1972000<br />functionbasedy = 2.1972000<br />Testing dictionary based y == function based y for loc2, 4, 1, 5000.00000 . . .<br />dictionarybasedy = 2.1522000<br />functionbasedy = 2.1522000<br />Testing dictionary based y == function based y for loc2, 4, 2, 0.00000 . . .<br />dictionarybasedy = 3.2461000<br />functionbasedy = 3.2461000<br />Testing dictionary based y == function based y for loc2, 4, 2, 5000.00000 . . .<br />dictionarybasedy = 0.7461000<br />functionbasedy = 0.7461000<br />Testing dictionary based y == function based y for loc2, 5, 1, 0.00000 . . .<br />dictionarybasedy = 2.1000000<br />functionbasedy = 2.1000000<br />Testing dictionary based y == function based y for loc2, 5, 1, 5000.00000 . . .<br />dictionarybasedy = 2.1000000<br />functionbasedy = 2.1000000<br />Testing dictionary based y == function based y for loc2, 5, 2, 0.00000 . . .<br />dictionarybasedy = 2.9564000<br />functionbasedy = 2.9564000<br />Testing dictionary based y == function based y for loc2, 5, 2, 5000.00000 . . .<br />dictionarybasedy = 1.4564000<br />functionbasedy = 1.4564000<br />Testing dictionary based y == function based y for loc2, 6, 1, 0.00000 . . .<br />dictionarybasedy = 3.7310000<br />functionbasedy = 3.7310000<br />Testing dictionary based y == function based y for loc2, 6, 1, 5000.00000 . . .<br />dictionarybasedy = -1.2690000<br />functionbasedy = -1.2690000<br />Testing dictionary based y == function based y for loc2, 6, 2, 0.00000 . . .<br />dictionarybasedy = 4.0757000<br />functionbasedy = 4.0757000<br />Testing dictionary based y == function based y for loc2, 6, 2, 5000.00000 . . .<br />dictionarybasedy = -1.9243000<br />functionbasedy = -1.9243000<br />Testing dictionary based y == function based y for loc2, 7, 1, 0.00000 . . .<br />dictionarybasedy = 3.8860000<br />functionbasedy = 3.8860000<br />Testing dictionary based y == function based y for loc2, 7, 1, 5000.00000 . . .<br />dictionarybasedy = -2.1140000<br />functionbasedy = -2.1140000<br />Testing dictionary based y == function based y for loc2, 7, 2, 0.00000 . . .<br />dictionarybasedy = 2.6787000<br />functionbasedy = 2.6787000<br />Testing dictionary based y == function based y for loc2, 7, 2, 5000.00000 . . .<br />dictionarybasedy = 2.3287000<br />functionbasedy = 2.3287000<br />Testing dictionary based y == function based y for loc2, 8, 1, 0.00000 . . .<br />dictionarybasedy = 2.6500000<br />functionbasedy = 2.6500000<br />Testing dictionary based y == function based y for loc2, 8, 1, 5000.00000 . . .<br />dictionarybasedy = 2.6500000<br />functionbasedy = 2.6500000<br />Testing dictionary based y == function based y for loc2, 8, 2, 0.00000 . . .<br />dictionarybasedy = 2.6500000<br />functionbasedy = 2.6500000<br />Testing dictionary based y == function based y for loc2, 8, 2, 5000.00000 . . .<br />dictionarybasedy = 2.6500000<br />functionbasedy = 2.6500000<br />Testing dictionary based y == function based y for loc2, 9, 1, 0.00000 . . .<br />dictionarybasedy = 3.3121000<br />functionbasedy = 3.3121000<br />Testing dictionary based y == function based y for loc2, 9, 1, 5000.00000 . . .<br />dictionarybasedy = 0.3121000<br />functionbasedy = 0.3121000<br />Testing dictionary based y == function based y for loc2, 9, 2, 0.00000 . . .<br />dictionarybasedy = 3.3121000<br />functionbasedy = 3.3121000<br />Testing dictionary based y == function based y for loc2, 9, 2, 5000.00000 . . .<br />dictionarybasedy = 0.3121000<br />functionbasedy = 0.3121000<br />Testing dictionary based y == function based y for loc3, 1, 1, 0.00000 . . .<br />dictionarybasedy = 2.4900000<br />functionbasedy = 2.4900000<br />Testing dictionary based y == function based y for loc3, 1, 1, 1.00000 . . .<br />dictionarybasedy = 2.4900000<br />functionbasedy = 2.4900000<br />Testing dictionary based y == function based y for loc3, 1, 1, 1.00001 . . .<br />dictionarybasedy = 2.8853000<br />functionbasedy = 2.8853000<br />Testing dictionary based y == function based y for loc3, 1, 1, 5000.00000 . . .<br />dictionarybasedy = 0.8857000<br />functionbasedy = 0.8857000<br />Testing dictionary based y == function based y for loc3, 1, 2, 0.00000 . . .<br />dictionarybasedy = 2.4900000<br />functionbasedy = 2.4900000<br />Testing dictionary based y == function based y for loc3, 1, 2, 1.00000 . . .<br />dictionarybasedy = 2.4900000<br />functionbasedy = 2.4900000<br />Testing dictionary based y == function based y for loc3, 1, 2, 1.00001 . . .<br />dictionarybasedy = 2.8853000<br />functionbasedy = 2.8853000<br />Testing dictionary based y == function based y for loc3, 1, 2, 5000.00000 . . .<br />dictionarybasedy = 0.8857000<br />functionbasedy = 0.8857000<br />Returning default Y for loc3, 99, 1, 0.00000 . . .<br />Testing dictionary based y == function based y for loc3, 99, 1, 0.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Returning default Y for loc3, 99, 1, 5000.00000 . . .<br />Testing dictionary based y == function based y for loc3, 99, 1, 5000.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Returning default Y for loc3, 99, 2, 0.00000 . . .<br />Testing dictionary based y == function based y for loc3, 99, 2, 0.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Returning default Y for loc3, 99, 2, 5000.00000 . . .<br />Testing dictionary based y == function based y for loc3, 99, 2, 5000.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Testing dictionary based y == function based y for loc3, 4, 1, 0.00000 . . .<br />dictionarybasedy = 2.6800000<br />functionbasedy = 2.6800000<br />Testing dictionary based y == function based y for loc3, 4, 1, 1.10000 . . .<br />dictionarybasedy = 2.6800000<br />functionbasedy = 2.6800000<br />Testing dictionary based y == function based y for loc3, 4, 1, 1.10001 . . .<br />dictionarybasedy = 4.6670200<br />functionbasedy = 4.6670200<br />Testing dictionary based y == function based y for loc3, 4, 1, 5000.00000 . . .<br />dictionarybasedy = -4.3310000<br />functionbasedy = -4.3310000<br />Testing dictionary based y == function based y for loc3, 4, 2, 0.00000 . . .<br />dictionarybasedy = 2.6800000<br />functionbasedy = 2.6800000<br />Testing dictionary based y == function based y for loc3, 4, 2, 1.10000 . . .<br />dictionarybasedy = 2.6800000<br />functionbasedy = 2.6800000<br />Testing dictionary based y == function based y for loc3, 4, 2, 1.10001 . . .<br />dictionarybasedy = 4.6670200<br />functionbasedy = 4.6670200<br />Testing dictionary based y == function based y for loc3, 4, 2, 5000.00000 . . .<br />dictionarybasedy = -4.3310000<br />functionbasedy = -4.3310000<br />Testing dictionary based y == function based y for loc3, 5, 1, 950.01000 . . .<br />dictionarybasedy = 2.5495900<br />functionbasedy = 2.5495900<br />Testing dictionary based y == function based y for loc3, 5, 1, 5000.00000 . . .<br />dictionarybasedy = -1.5004000<br />functionbasedy = -1.5004000<br />Testing dictionary based y == function based y for loc3, 5, 1, 0.00000 . . .<br />dictionarybasedy = 2.5900000<br />functionbasedy = 2.5900000<br />Testing dictionary based y == function based y for loc3, 5, 1, 950.00000 . . .<br />dictionarybasedy = 2.5900000<br />functionbasedy = 2.5900000<br />Testing dictionary based y == function based y for loc3, 5, 2, 950.01000 . . .<br />dictionarybasedy = 2.5495900<br />functionbasedy = 2.5495900<br />Testing dictionary based y == function based y for loc3, 5, 2, 5000.00000 . . .<br />dictionarybasedy = -1.5004000<br />functionbasedy = -1.5004000<br />Testing dictionary based y == function based y for loc3, 5, 2, 0.00000 . . .<br />dictionarybasedy = 2.5900000<br />functionbasedy = 2.5900000<br />Testing dictionary based y == function based y for loc3, 5, 2, 950.00000 . . .<br />dictionarybasedy = 2.5900000<br />functionbasedy = 2.5900000<br />Testing dictionary based y == function based y for loc3, 6, 1, 0.00000 . . .<br />dictionarybasedy = 2.6500000<br />functionbasedy = 2.6500000<br />Testing dictionary based y == function based y for loc3, 6, 1, 1.08000 . . .<br />dictionarybasedy = 2.6500000<br />functionbasedy = 2.6500000<br />Testing dictionary based y == function based y for loc3, 6, 1, 1.08001 . . .<br />dictionarybasedy = 4.0650960<br />functionbasedy = 4.0650960<br />Testing dictionary based y == function based y for loc3, 6, 1, 5000.00000 . . .<br />dictionarybasedy = -2.4335000<br />functionbasedy = -2.4335000<br />Testing dictionary based y == function based y for loc3, 6, 2, 0.00000 . . .<br />dictionarybasedy = 2.6500000<br />functionbasedy = 2.6500000<br />Testing dictionary based y == function based y for loc3, 6, 2, 1.08000 . . .<br />dictionarybasedy = 2.6500000<br />functionbasedy = 2.6500000<br />Testing dictionary based y == function based y for loc3, 6, 2, 1.08001 . . .<br />dictionarybasedy = 4.0650960<br />functionbasedy = 4.0650960<br />Testing dictionary based y == function based y for loc3, 6, 2, 5000.00000 . . .<br />dictionarybasedy = -2.4335000<br />functionbasedy = -2.4335000<br />Testing dictionary based y == function based y for loc3, 7, 1, 0.00000 . . .<br />dictionarybasedy = 2.6700000<br />functionbasedy = 2.6700000<br />Testing dictionary based y == function based y for loc3, 7, 1, 1.05000 . . .<br />dictionarybasedy = 2.6700000<br />functionbasedy = 2.6700000<br />Testing dictionary based y == function based y for loc3, 7, 1, 1.05001 . . .<br />dictionarybasedy = 3.5919550<br />functionbasedy = 3.5919550<br />Testing dictionary based y == function based y for loc3, 7, 1, 5000.00000 . . .<br />dictionarybasedy = -0.9071000<br />functionbasedy = -0.9071000<br />Testing dictionary based y == function based y for loc3, 7, 2, 0.00000 . . .<br />dictionarybasedy = 2.6700000<br />functionbasedy = 2.6700000<br />Testing dictionary based y == function based y for loc3, 7, 2, 1.05000 . . .<br />dictionarybasedy = 2.6700000<br />functionbasedy = 2.6700000<br />Testing dictionary based y == function based y for loc3, 7, 2, 1.05000 . . .<br />dictionarybasedy = 3.5919550<br />functionbasedy = 3.5919550<br />Testing dictionary based y == function based y for loc3, 7, 2, 5000.00000 . . .<br />dictionarybasedy = -0.9071000<br />functionbasedy = -0.9071000<br />Testing dictionary based y == function based y for loc3, 8, 1, 0.00000 . . .<br />dictionarybasedy = 2.6400000<br />functionbasedy = 2.6400000<br />Testing dictionary based y == function based y for loc3, 8, 1, 1.00000 . . .<br />dictionarybasedy = 2.6400000<br />functionbasedy = 2.6400000<br />Testing dictionary based y == function based y for loc3, 8, 1, 1.00001 . . .<br />dictionarybasedy = 3.3285000<br />functionbasedy = 3.3285000<br />Testing dictionary based y == function based y for loc3, 8, 1, 5000.00000 . . .<br />dictionarybasedy = 0.3291000<br />functionbasedy = 0.3291000<br />Testing dictionary based y == function based y for loc3, 8, 2, 0.00000 . . .<br />dictionarybasedy = 2.6400000<br />functionbasedy = 2.6400000<br />Testing dictionary based y == function based y for loc3, 8, 2, 1.00000 . . .<br />dictionarybasedy = 2.6400000<br />functionbasedy = 2.6400000<br />Testing dictionary based y == function based y for loc3, 8, 2, 1.00001 . . .<br />dictionarybasedy = 3.3285000<br />functionbasedy = 3.3285000<br />Testing dictionary based y == function based y for loc3, 8, 2, 5000.00000 . . .<br />dictionarybasedy = 0.3291000<br />functionbasedy = 0.3291000<br />Testing dictionary based y == function based y for loc3, 9, 1, 0.00000 . . .<br />dictionarybasedy = 2.4900000<br />functionbasedy = 2.4900000<br />Testing dictionary based y == function based y for loc3, 9, 1, 5000.00000 . . .<br />dictionarybasedy = 2.4900000<br />functionbasedy = 2.4900000<br />Testing dictionary based y == function based y for loc3, 9, 2, 0.00000 . . .<br />dictionarybasedy = 2.4900000<br />functionbasedy = 2.4900000<br />Testing dictionary based y == function based y for loc3, 9, 2, 5000.00000 . . .<br />dictionarybasedy = 2.4900000<br />functionbasedy = 2.4900000<br />Testing dictionary based y == function based y for loc1, 1, 1, 0.00000 . . .<br />dictionarybasedy = 2.3500000<br />functionbasedy = 2.3500000<br />Testing dictionary based y == function based y for loc1, 1, 1, 5000.00000 . . .<br />dictionarybasedy = 2.3500000<br />functionbasedy = 2.3500000<br />Testing dictionary based y == function based y for loc1, 1, 2, 0.00000 . . .<br />dictionarybasedy = 2.4500000<br />functionbasedy = 2.4500000<br />Testing dictionary based y == function based y for loc1, 1, 2, 5000.00000 . . .<br />dictionarybasedy = 2.4500000<br />functionbasedy = 2.4500000<br />Returning default Y for loc1, 99, 1, 0.00000 . . .<br />Testing dictionary based y == function based y for loc1, 99, 1, 0.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Returning default Y for loc1, 99, 1, 5000.00000 . . .<br />Testing dictionary based y == function based y for loc1, 99, 1, 5000.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Returning default Y for loc1, 99, 2, 0.00000 . . .<br />Testing dictionary based y == function based y for loc1, 99, 2, 0.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Returning default Y for loc1, 99, 2, 5000.00000 . . .<br />Testing dictionary based y == function based y for loc1, 99, 2, 5000.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Testing dictionary based y == function based y for loc1, 4, 1, 1.29001 . . .<br />dictionarybasedy = 6.7215720<br />functionbasedy = 6.7215720<br />Testing dictionary based y == function based y for loc1, 4, 1, 5000.00000 . . .<br />dictionarybasedy = -9.2743000<br />functionbasedy = -9.2743000<br />Testing dictionary based y == function based y for loc1, 4, 1, 0.00000 . . .<br />dictionarybasedy = 2.6000000<br />functionbasedy = 2.6000000<br />Testing dictionary based y == function based y for loc1, 4, 1, 1.29000 . . .<br />dictionarybasedy = 2.6000000<br />functionbasedy = 2.6000000<br />Testing dictionary based y == function based y for loc1, 4, 2, 0.00000 . . .<br />dictionarybasedy = 2.8215000<br />functionbasedy = 2.8215000<br />Testing dictionary based y == function based y for loc1, 4, 2, 5000.00000 . . .<br />dictionarybasedy = 1.8215000<br />functionbasedy = 1.8215000<br />Testing dictionary based y == function based y for loc1, 5, 1, 1.25001 . . .<br />dictionarybasedy = 5.7119500<br />functionbasedy = 5.7119500<br />Testing dictionary based y == function based y for loc1, 5, 1, 5000.00000 . . .<br />dictionarybasedy = -7.2848000<br />functionbasedy = -7.2848000<br />Testing dictionary based y == function based y for loc1, 5, 1, 0.00000 . . .<br />dictionarybasedy = 2.4700000<br />functionbasedy = 2.4700000<br />Testing dictionary based y == function based y for loc1, 5, 1, 1.25000 . . .<br />dictionarybasedy = 2.4700000<br />functionbasedy = 2.4700000<br />Testing dictionary based y == function based y for loc1, 5, 2, 0.00000 . . .<br />dictionarybasedy = 2.8733000<br />functionbasedy = 2.8733000<br />Testing dictionary based y == function based y for loc1, 5, 2, 5000.00000 . . .<br />dictionarybasedy = 1.3733000<br />functionbasedy = 1.3733000<br />Testing dictionary based y == function based y for loc1, 6, 1, 0.00000 . . .<br />dictionarybasedy = 2.5700000<br />functionbasedy = 2.5700000<br />Testing dictionary based y == function based y for loc1, 6, 1, 1.31000 . . .<br />dictionarybasedy = 2.5700000<br />functionbasedy = 2.5700000<br />Testing dictionary based y == function based y for loc1, 6, 1, 1.31001 . . .<br />dictionarybasedy = 4.9283420<br />functionbasedy = 4.9283420<br />Testing dictionary based y == function based y for loc1, 6, 1, 5000.00000 . . .<br />dictionarybasedy = -4.0693000<br />functionbasedy = -4.0693000<br />Testing dictionary based y == function based y for loc1, 6, 2, 0.00000 . . .<br />dictionarybasedy = 3.0612000<br />functionbasedy = 3.0612000<br />Testing dictionary based y == function based y for loc1, 6, 2, 5000.00000 . . .<br />dictionarybasedy = 1.0612000<br />functionbasedy = 1.0612000<br />Testing dictionary based y == function based y for loc1, 7, 1, 1.31501 . . .<br />dictionarybasedy = 6.5440550<br />functionbasedy = 6.5440550<br />Testing dictionary based y == function based y for loc1, 7, 1, 5000.00000 . . .<br />dictionarybasedy = -8.4520000<br />functionbasedy = -8.4520000<br />Testing dictionary based y == function based y for loc1, 7, 1, 0.00000 . . .<br />dictionarybasedy = 2.6000000<br />functionbasedy = 2.6000000<br />Testing dictionary based y == function based y for loc1, 7, 1, 1.31500 . . .<br />dictionarybasedy = 2.6000000<br />functionbasedy = 2.6000000<br />Testing dictionary based y == function based y for loc1, 7, 2, 0.00000 . . .<br />dictionarybasedy = 2.9580000<br />functionbasedy = 2.9580000<br />Testing dictionary based y == function based y for loc1, 7, 2, 5000.00000 . . .<br />dictionarybasedy = -12.5420000<br />functionbasedy = -12.5420000<br />Testing dictionary based y == function based y for loc1, 8, 1, 0.00000 . . .<br />dictionarybasedy = 2.6000000<br />functionbasedy = 2.6000000<br />Testing dictionary based y == function based y for loc1, 8, 1, 5000.00000 . . .<br />dictionarybasedy = 2.6000000<br />functionbasedy = 2.6000000<br />Testing dictionary based y == function based y for loc1, 8, 2, 0.00000 . . .<br />dictionarybasedy = 2.6000000<br />functionbasedy = 2.6000000<br />Testing dictionary based y == function based y for loc1, 8, 2, 5000.00000 . . .<br />dictionarybasedy = 2.6000000<br />functionbasedy = 2.6000000<br />Testing dictionary based y == function based y for loc1, 9, 1, 0.00000 . . .<br />dictionarybasedy = 2.5300000<br />functionbasedy = 2.5300000<br />Testing dictionary based y == function based y for loc1, 9, 1, 1.27500 . . .<br />dictionarybasedy = 2.5300000<br />functionbasedy = 2.5300000<br />Testing dictionary based y == function based y for loc1, 9, 1, 1.27501 . . .<br />dictionarybasedy = 6.4777175<br />functionbasedy = 6.4777175<br />Testing dictionary based y == function based y for loc1, 9, 1, 5000.00000 . . .<br />dictionarybasedy = 4.9781000<br />functionbasedy = 4.9781000<br />Testing dictionary based y == function based y for loc1, 9, 2, 0.00000 . . .<br />dictionarybasedy = 2.5400000<br />functionbasedy = 2.5400000<br />Testing dictionary based y == function based y for loc1, 9, 2, 5000.00000 . . .<br />dictionarybasedy = 2.5400000<br />functionbasedy = 2.5400000<br />Testing dictionary based y == function based y for loc4, 1, 1, 0.00000 . . .<br />dictionarybasedy = 2.7802000<br />functionbasedy = 2.7802000<br />Testing dictionary based y == function based y for loc4, 1, 1, 5000.00000 . . .<br />dictionarybasedy = 1.2802000<br />functionbasedy = 1.2802000<br />Testing dictionary based y == function based y for loc4, 1, 2, 0.00000 . . .<br />dictionarybasedy = 2.7802000<br />functionbasedy = 2.7802000<br />Testing dictionary based y == function based y for loc4, 1, 2, 5000.00000 . . .<br />dictionarybasedy = 1.2802000<br />functionbasedy = 1.2802000<br />Returning default Y for loc4, 99, 1, 0.00000 . . .<br />Testing dictionary based y == function based y for loc4, 99, 1, 0.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Returning default Y for loc4, 99, 1, 5000.00000 . . .<br />Testing dictionary based y == function based y for loc4, 99, 1, 5000.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Returning default Y for loc4, 99, 2, 0.00000 . . .<br />Testing dictionary based y == function based y for loc4, 99, 2, 0.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Returning default Y for loc4, 99, 2, 5000.00000 . . .<br />Testing dictionary based y == function based y for loc4, 99, 2, 5000.00000 . . .<br />dictionarybasedy = 2.5000000<br />functionbasedy = 2.5000000<br />Testing dictionary based y == function based y for loc4, 4, 1, 0.00000 . . .<br />dictionarybasedy = 2.6000000<br />functionbasedy = 2.6000000<br />Testing dictionary based y == function based y for loc4, 4, 1, 1.00000 . . .<br />dictionarybasedy = 2.6000000<br />functionbasedy = 2.6000000<br />Testing dictionary based y == function based y for loc4, 4, 1, 1.00001 . . .<br />dictionarybasedy = 4.5528000<br />functionbasedy = 4.5528000<br />Testing dictionary based y == function based y for loc4, 4, 1, 5000.00000 . . .<br />dictionarybasedy = -5.4452000<br />functionbasedy = -5.4452000<br />Testing dictionary based y == function based y for loc4, 4, 2, 0.00000 . . .<br />dictionarybasedy = 2.6500000<br />functionbasedy = 2.6500000<br />Testing dictionary based y == function based y for loc4, 4, 2, 1125.00000 . . .<br />dictionarybasedy = 2.6500000<br />functionbasedy = 2.6500000<br />Testing dictionary based y == function based y for loc4, 4, 2, 1125.01000 . . .<br />dictionarybasedy = 2.6808890<br />functionbasedy = 2.6808890<br />Testing dictionary based y == function based y for loc4, 4, 2, 5000.00000 . . .<br />dictionarybasedy = -1.5816000<br />functionbasedy = -1.5816000<br />Testing dictionary based y == function based y for loc4, 5, 1, 0.00000 . . .<br />dictionarybasedy = 4.2758000<br />functionbasedy = 4.2758000<br />Testing dictionary based y == function based y for loc4, 5, 1, 5000.00000 . . .<br />dictionarybasedy = -4.7242000<br />functionbasedy = -4.7242000<br />Testing dictionary based y == function based y for loc4, 5, 2, 0.00000 . . .<br />dictionarybasedy = 2.6535000<br />functionbasedy = 2.6535000<br />Testing dictionary based y == function based y for loc4, 5, 2, 5000.00000 . . .<br />dictionarybasedy = 2.1535000<br />functionbasedy = 2.1535000<br />Testing dictionary based y == function based y for loc4, 6, 1, 0.00000 . . .<br />dictionarybasedy = 2.5800000<br />functionbasedy = 2.5800000<br />Testing dictionary based y == function based y for loc4, 6, 1, 1.10000 . . .<br />dictionarybasedy = 2.5800000<br />functionbasedy = 2.5800000<br />Testing dictionary based y == function based y for loc4, 6, 1, 1.10001 . . .<br />dictionarybasedy = 4.0307700<br />functionbasedy = 4.0307700<br />Testing dictionary based y == function based y for loc4, 6, 1, 5000.00000 . . .<br />dictionarybasedy = -2.4678000<br />functionbasedy = -2.4678000<br />Testing dictionary based y == function based y for loc4, 6, 2, 0.00000 . . .<br />dictionarybasedy = 2.8081000<br />functionbasedy = 2.8081000<br />Testing dictionary based y == function based y for loc4, 6, 2, 5000.00000 . . .<br />dictionarybasedy = 1.8081000<br />functionbasedy = 1.8081000<br />Testing dictionary based y == function based y for loc4, 7, 1, 0.00000 . . .<br />dictionarybasedy = 2.6000000<br />functionbasedy = 2.6000000<br />Testing dictionary based y == function based y for loc4, 7, 1, 1.00000 . . .<br />dictionarybasedy = 2.6000000<br />functionbasedy = 2.6000000<br />Testing dictionary based y == function based y for loc4, 7, 1, 1.00001 . . .<br />dictionarybasedy = 4.3884000<br />functionbasedy = 4.3884000<br />Testing dictionary based y == function based y for loc4, 7, 1, 5000.00000 . . .<br />dictionarybasedy = -4.6098000<br />functionbasedy = -4.6098000<br />Testing dictionary based y == function based y for loc4, 7, 2, 0.00000 . . .<br />dictionarybasedy = 2.7334000<br />functionbasedy = 2.7334000<br />Testing dictionary based y == function based y for loc4, 7, 2, 5000.00000 . . .<br />dictionarybasedy = 2.2834000<br />functionbasedy = 2.2834000</span></span><br />
<br />
<span style="font-family: inherit;">And that's it. It <span style="font-family: inherit;">bails on an <span style="font-family: inherit;">AssertionError - I like to fix problems one at a time. It took me about six runs <span style="font-family: inherit;">to get <span style="font-family: inherit;">everything matched<span style="font-family: inherit;">.</span></span></span></span></span></span><br />
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><br /></span></span></span></span></span></span>
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"></span></span></span></span></span></span><br />
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;">Thank you for stopping by.</span></span></span></span></span></span>Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com2tag:blogger.com,1999:blog-524230429673765509.post-89415568588984258342016-08-11T12:45:00.001-07:002016-08-11T14:51:03.091-07:00sqlcmd faux csv dump and parsing with the csv moduleLately I had another Excel-VBA-Python one off hack project. Once again there was the dilemma of not being able to use MSSQL's bcp because my query string was too long. sqlcmd can run a query from a big SQL file, but, to the best of my knowledge, it does not do csv dumps.<br />
<br />
This is a hack. I would normally go to hell for it, but I've done so many other bad hacks I'd have to declare bankruptcy on my programming soul and start over. Onward.<br />
<br />
mssql query file:<br />
<br />
<span style="color: blue;"><strong><span style="font-family: "courier new"; font-size: large;"><SQL code></span></strong><br /><br /><span style="font-family: "courier new" , "courier" , monospace; font-size: large;"><strong>< . . . variable declarations, temp table declarations, etc. . . . ></strong></span></span><br />
<strong><span style="color: blue; font-family: "courier new"; font-size: large;"></span></strong><br />
<strong><span style="color: blue; font-family: "courier new"; font-size: large;">DECLARE @COMMA CHAR(1) = ',';<br />DECLARE @LOSSLESS INT = 3;</span></strong><br />
<strong><span style="color: blue; font-family: "courier new"; font-size: large;">DECLARE @DOUBLEQUOTE CHAR(1) = CHAR(34);</span></strong><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong>-- Concatenate strings.<br />-- Need quoted strings for stockpiles with spaces.<br />SELECT @DOUBLEQUOTE + StockpileShortName +</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> @DOUBLEQUOTE + @COMMA +</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> @DOUBLEQUOTE + StockpileID +</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> @DOUBLEQUOTE + @COMMA +<br /> @DOUBLEQUOTE + StkLoc +</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> @DOUBLEQUOTE + @COMMA +<br /> -- Go for full float precision.<br /> CONVERT(VARCHAR(35),</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> tonnes,</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> @LOSSLESS) + @COMMA +</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> CONVERT(VARCHAR(35),</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> grade01,</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> @LOSSLESS) + @COMMA +<br /> CONVERT(VARCHAR(35),</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> grade02,</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> @LOSSLESS) + @COMMA +<br /> CONVERT(VARCHAR(35),</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> grade03,</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> @LOSSLESS) + @COMMA +<br /> CONVERT(VARCHAR(35), </strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> grade04,</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> @LOSSLESS) + @COMMA +<br /> CONVERT(VARCHAR(35),</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> grade05,</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> @LOSSLESS) + @COMMA +<br /> CONVERT(VARCHAR(35),</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> grade06</strong></span><span style="color: blue;"><strong>,</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><span style="color: blue;"><strong> @LOSSLESS) <br />FROM ##inputresultspvctrachte</strong></span></span><br />
<span style="font-family: "courier new"; font-size: large;"><strong><br /><span style="color: blue;">< . . . ORDER BY clause . . .></span></strong></span><br />
<span style="color: blue; font-family: "courier new"; font-size: large;"><strong><End SQL code></strong></span><br />
<br />
It's pretty obvious what I'm doing (and I'd be shocked if I'm the first to do it): list all my fields on one line separated by commas that are part of the result record.<br />
<br />
A couple notes:<br />
<br />
1) all my string identifiers are in double quotes; all my float values are in unquoted text - this will help simplify the Python csv module code below.<br />
<br />
2) the @LOSSLESS "constant" - Microsoft's SQL documentation doesn't list an enumeration for this per se. It's just a straight up whole number 3. I'm a bit obsessive about constants - wrap that baby in a variable declaration! Lossless double precision means, if I recall correctly, SQL Server will give you seventeen digits of precision. This works for what I'm doing (mining stockpile management).<br />
<br />
The (rough) mssql command to run the query from a DOS prompt:<br />
<br />
<span style="color: #660000; font-family: "courier new"; font-size: large;"><strong>sqlcmd -S MYSERVERNAME -U MYUSERNAME -P MYPASSWORD -I myqueryfile.sql -o theoutputfile.csv -b</strong></span><br />
<span style="font-family: "courier new"; font-size: large;"><strong><span style="color: blue;"></span></strong></span><br />
The -b switch provides a Windows error code. It's a crude check for whether the query parsed OK and ran, but it's better than nothing.<br />
<br />
The output looks something like this (sorry about the small font):<br />
<br />
<strong><span style="color: #274e13; font-family: "courier new"; font-size: x-small;"><. . . sqlcmd messages . . .></span></strong><br />
<strong><span style="color: #274e13; font-family: "courier new"; font-size: x-small;"></span></strong><br />
<strong><span style="color: #274e13; font-family: "courier new"; font-size: x-small;">"KEY003","hakunamatadacopper","good",28776.5,X.XXXXX,X.XXXXX,X.XXXXX,X.XXXXXX,XX.XXXX,X.XXXXX<br />"KEY005","tembomalachite","not as good",25855.9,X.XXXXX,X.XXXXXX,X.XXXXX,X.XXXXXX,XX.XXXX,X.XXXXX<br />"KEY006","simbacobalt","not as good",156767,X.XXXXXX,X.XXXXXXX,X.XXXXXX,X.XXXXXXX,XX.XXXX,X.XXXXXX<br />"KEY010","jambocobalt","good",488977,X.XXXXX,X.XXXXXX,X.XXXX,X.XXXXXX,XXX.XXX,X.XXXXX<br />"KEY015","cucoagogo","good",39576.7,X.XXXX,X.XXXXXX,X.XXXXX,X.XXXXXX,XX.XXXX,X.XXXXX<br />"KEY016","greenrock","good",160,X.XXX,X.XXX,X.XXX,X.XXX,XXX.XX,X.XX<br />"KEY033","pinkrock","not as good",81504.3,X.XXXXX,X.XXXXXX,X.XXXXX,X.XXXXXX,XXX.XXX,X.XXXX<br />"KEY006","funkyleach","not as good",55866.1,X.XXXXXX,X.XXXXXX,X.XXXXXX,X.XXXXXX,XXX.XXX,X.XXXXXX<br />"KEY010","metalhome","good",30301.1,X.XXXXX,X.XXXXXX,X.XXXXX,X.XXXXXX,XXX.XX,X.XXXXX<br />"KEY015","boulderpile","good",2878.25,X.XX,X.XX,X.XXX,X.XXX,XX.XXX,X.XXX<br />"KEY033","berm","not as good",5309.97,X.XXXXX,X.XXXXXX,X.XXXXX,X.XXXXXX,XXX.XXX,X.XXXXX</span></strong><br />
<strong><span style="color: #274e13; font-family: "courier new"; font-size: x-small;">(11 rows affected)</span></strong><br />
<br />
I've given my stockpiles funny names and X'ed out the numeric grades to sanitize this, but you get the general idea.<br />
<br />
Now, finally to some Python code. I'll get the lines of the file (faux csv) I want and parse them with the csv module reader object. The whole deal is kind of verbose (I have a collections.namedtuple object that takes each "column" as an attribute). I'm only going to show the part that segregates the lines I want and reads them with the csv reader. The wpx module has all of my constants and static data definition in it. Some of the whitespace issues I still need to work out. For now I brute force stripped off leading and trailing whitespace from values.<br />
<br />
<span style="color: #990000; font-family: "courier new"; font-size: large;"><strong>def parsesqlcmdoutput():<br /> """<br /> Parse output from sqlcmd.</strong></span><br />
<span style="color: #274e13; font-family: "courier new"; font-size: large;"><strong><span style="color: #990000;"> Returns list of<br /> collections.namedtuple<br /> objects.<br /> """<br /> lines = []<br /> with open(wpx.OUTPUTFILE +</span></strong></span><br />
<span style="color: #274e13; font-family: "courier new"; font-size: large;"><strong><span style="color: #990000;"> wpx.CSVEXT, 'r') as f:<br /> # Get relevant lines.<br /> # Rip whitespace off end - excessive.</span></strong></span><br />
<span style="color: #274e13; font-family: "courier new"; font-size: large;"><strong><span style="color: #990000;"> # XXX - string overloading - hack.<br /> lines = [linex.strip() for</span></strong></span><br />
<span style="color: #274e13; font-family: "courier new"; font-size: large;"><strong><span style="color: #990000;"> linex in f if<br /> linex[0:wpx.STKFLAG[0]] ==</span></strong></span><br />
<span style="color: #274e13; font-family: "courier new"; font-size: large;"><strong><span style="color: #990000;"> wpx.STKFLAG[1]]<br /> rdr = csv.reader(lines, quoting =</span></strong></span><br />
<span style="color: #274e13; font-family: "courier new"; font-size: large;"><strong><span style="color: #990000;"> QUOTENONN)<br /> records = []<br /> for r in rdr:<br /> # Get rid of whitespace padding</span></strong></span><br />
<span style="color: #274e13; font-family: "courier new"; font-size: large;"><strong><span style="color: #990000;"> # around string values.<br /> for x in xrange(wpx.IHSTRIDX):<br /> r[x] = r[x].strip()<br /> records.append(wpx.INPUTRECORD(*r))<br /> return records</span></strong></span><br />
That csv.QUOTENONN (quote non-numeric) is handy. As per the Python doc, anything that isn't quoted is taken as a float. As long as my data are clean, I should be good there and it strips out some cruft code-wise.<br />
<br />
The list comprehension is an iterable object the same way a file is, so the csv module's reader works fine on it.<br />
<br />
That's about it (minus a lot of background code - if you need that, let me know and I'll put it in the comments).<br />
<br />
Thanks for stopping by.<br />
<span style="font-family: "courier new"; font-size: large;"><br /><strong> </strong></span>Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com0tag:blogger.com,1999:blog-524230429673765509.post-68898106532504604222016-07-10T10:46:00.000-07:002016-07-10T17:41:10.360-07:00Using Generators and Coroutines to Merge Tabular Data (Drill Holes)I have some mining drill hole data that I need to merge into an old vendor FORTRAN input format. Basically I do a series of SQL pulls from the drillhole database to csv files, then merge the data. My methodology has been a bit brute force in matching the separate parts of the drill hole data (lists, opening and closing of files to find matching holes, etc.). My thought was that I could do this more elegantly and efficiently by iterating through the files with generators.<br />
<br />
The ability of generators to communicate with each other via the send() method intrigued me. I had always been a bit shy about using this language feature. My csv problem gave me a justification for checking it out.<br />
<br />
The reference I used was <a href="http://www.dabeaz.com/coroutines/Coroutines.pdf" target="_blank">Dr. Dave Beazley's 2009 Pycon Tutorial</a>. He does a nice job of explaining things as well as dispatching good advice. (I disobeyed the good advice in the interest of shoehorning coroutines into my solution; I'll cover this below.) Beazley defines a coroutine in the sense of generators and the "yield" keyword as generators where "yield" is used more generally. That is the context I'm using the word "coroutine" in this post.<br />
<br />
Given my problem of a one (drill hole start survey) to many (drill hole interval values) relationship, I attempted a very simple (perhaps oversimplified) toy program demo of what I wanted to do with real data:<br />
<br />
<b><span style="font-family: "courier new" , "courier" , monospace;">def coroutinex(subgenerator):<br /> """<br /> Generator function that consumes<br /> a key value sent from a higher<br /> level generator. This generator<br /> yields two tuples of the form<br /> (<boolean>, data). The boolean<br /> value indicates whether the key<br /> matches the data.<br /><br /> Returns a generator.<br /> """<br /> while True:<br /> # One entry point for send()/reset.<br /> keyx = yield<br /> subdatatop = next(subgenerator)<br /> if subdatatop[0] == keyx:<br /> yield (True, subdatatop)<br /> for subdataloop in subgenerator:<br /> if subdataloop[0] == keyx:<br /> yield (True, subdataloop)<br /> else:<br /> yield (False, subdataloop)<br /> break<br /> </span></b><br />
<b><span style="font-family: "courier new" , "courier" , monospace;">def toplevelgen(topleveliter, coroutinex):<br /> """<br /> Top level generator function.<br /><br /> subgenerator is a generator<br /> that this generator sends<br /> a key value to. The <br /> subgenerator yields a two<br /> tuple that communicates if<br /> the key matches or not.<br /><br /> Returns a generator.<br /> """<br /> # Get sub generator/coroutine initialized.<br /> coroutinex.send(None)<br /> # Variable for dealing with return<br /> # from sub-generator/coroutine.<br /> subvalue = False<br /> for keyx in topleveliter:<br /> yield keyx<br /> if subvalue:<br /> yield subvalue<br /> subvalue = coroutinex.send(keyx)<br /> # Get sub generator/coroutine re-initialized<br /> # after send() reset.<br /> if subvalue is None:<br /> # XXX - hack<br /> subvalue = coroutinex.send(keyx)<br /> yield subvalue<br /> for submessage in coroutinex:<br /> # XXX - another hack to deal with yield of None.<br /> if not submessage:<br /> continue<br /> subvalue = submessage<br /> # if submessage[0] is True, kick it out.<br /> if submessage[0]:<br /> yield submessage<br /> else:<br /> # Keep subvalue for after keyvalue<br /> # yield at top.<br /> break<br /><br />topleveliter = range(44, 55)<br />keysx = [44, 44, 44, 45, 45, 45, 45, 45,<br /> 46, 46, 46, 46, 46, 46, 46, 46,<br /> 47, 47, 47, 48, 48, 48, 48, 48,<br /> 49, 49, 49, 49, 49, 49, 50, 50,<br /> 51, 51, 51, 51, 51, 51, 51, 51,<br /> 52, 52, 52, 52, 52, 52, 52, 52,<br /> 53, 53, 53, 53, 53, 53, 53, 53,<br /> 54, 54, 54, 54, 54, 54, 54, 54]<br /><br />sequencex = range(1, len(keysx) + 1)<br />subgenerator = zip(keysx, sequencex)<br /><br />gensub = coroutinex(subgenerator)<br />genmain = toplevelgen(topleveliter, gensub)<br /><br />for x in genmain:<br /> print(x)</span></b><br />
<br />
<br />
<span style="font-family: inherit;">Output:</span><br />
<span style="font-family: inherit;"><br /><span style="font-family: inherit;"><b><span style="font-family: "courier new" , "courier" , monospace;">44<br />(True, (44, 1))<br />(True, (44, 2))<br />(True, (44, 3))<br />45<br />(False, (45, 4))<br />(True, (45, 5))<br />(True, (45, 6))<br />(True, (45, 7))<br />(True, (45, 8))<br />46<br />(False, (46, 9))<br />(True, (46, 10))<br />(True, (46, 11))<br />(True, (46, 12))<br />(True, (46, 13))<br />(True, (46, 14))<br />(True, (46, 15))<br />(True, (46, 16))<br />47<br />(False, (47, 17))<br />(True, (47, 18))<br />(True, (47, 19))<br />48<br />(False, (48, 20))<br />(True, (48, 21))<br />(True, (48, 22))<br />(True, (48, 23))<br />(True, (48, 24))<br />49<br />(False, (49, 25))<br />(True, (49, 26))<br />(True, (49, 27))<br />(True, (49, 28))<br />(True, (49, 29))<br />(True, (49, 30))<br />50<br />(False, (50, 31))<br />(True, (50, 32))<br />51<br />(False, (51, 33))<br />(True, (51, 34))<br />(True, (51, 35))<br />(True, (51, 36))<br />(True, (51, 37))<br />(True, (51, 38))<br />(True, (51, 39))<br />(True, (51, 40))<br />52<br />(False, (52, 41))<br />(True, (52, 42))<br />(True, (52, 43))<br />(True, (52, 44))<br />(True, (52, 45))<br />(True, (52, 46))<br />(True, (52, 47))<br />(True, (52, 48))<br />53<br />(False, (53, 49))<br />(True, (53, 50))<br />(True, (53, 51))<br />(True, (53, 52))<br />(True, (53, 53))<br />(True, (53, 54))<br />(True, (53, 55))<br />(True, (53, 56))<br />54<br />(False, (54, 57))<br />(True, (54, 58))<br />(True, (54, 59))<br />(True, (54, 60))<br />(True, (54, 61))<br />(True, (54, 62))<br />(True, (54, 63))<br />(True, (54, 64))</span></b></span></span><br />
<br />
<span style="font-family: inherit;">Back to Dr. Beazley's advice<span style="font-family: inherit;"> - he<span style="font-family: inherit;"> doesn't recommend thi<span style="font-family: inherit;">s - even though <span style="font-family: inherit;">"yield" is the keywor<span style="font-family: inherit;">d<span style="font-family: inherit;">, it means t<span style="font-family: inherit;">wo different things in two different contexts. Do not mix generator and coroutine functionality. I'm going ahead in this <span style="font-family: inherit;">post and doing it anyway. I don't have an excuse. <span style="font-family: inherit;">It does remind me of some old Bob Dylan lyrics:</span></span></span></span></span></span></span></span></span></span><br />
<br />
<div style="text-align: left;">
<b><i>Now the rainman gave me two cures<br />Then he said, "Jump right in"<br />The one was Texas medicine<br />The other was just railroad gin<br />An' like a fool I mixed them<br />An' it strangled up my mind</i></b><br />
<br /></div>
<div style="text-align: left;">
</div>
<div style="text-align: left;">
<span style="font-family: inherit;">It's OK, Bob, some of us just need to learn thi<span style="font-family: inherit;">ngs the hard way.<br /><span style="font-family: inherit;"> </span></span></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;">Onward.</span></span></span><br />
<br /></div>
<div style="text-align: left;">
</div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;">A brief diversion on drill holes<span style="font-family: inherit;"> - the data for <span style="font-family: inherit;">a <span style="font-family: inherit;">small <span style="font-family: inherit;">scale (about 2,000 feet or less) <span style="font-family: inherit;">geotechnical or gelogic drill hole come back in <span style="font-family: inherit;">three parts<span style="font-family: inherit;">:</span></span></span></span></span></span></span></span></span></span><br />
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><br /><span style="font-family: inherit;">1) collar - where the hole starts<span style="font-family: inherit;"> in space <span style="font-family: inherit;">(coordinates).</span></span></span></span></span></span></span></span></span></span></span></span></span><br />
<br /></div>
<div style="text-align: left;">
</div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;">2) surveys - where the hole ends up going in space relative to the collar (drill pipe <span style="font-family: inherit;">has proven to be amazingly flexible when passing through rock).</span></span></span></span></span></span></span></span></span></span></span></span></span></span><br />
<br /></div>
<div style="text-align: left;">
</div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;">3) assays - usually the hole is <span style="font-family: inherit;">sample<span style="font-family: inherit;">d along intervals and chemically or <span style="font-family: inherit;">physically analyzed. The assay intervals <span style="font-family: inherit;">may or may not coincide with survey intervals.</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><br /></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;">Clear as (drilling) mud? Gr<span style="font-family: inherit;">eat - back to Python.</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><br />
<br /></div>
<div style="text-align: left;">
</div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;">The problem<span style="font-family: inherit;">:</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><br />
<br /></div>
<div style="text-align: left;">
</div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;">Three tabular<span style="font-family: inherit;"> csv dumps from SQL - a collar file, a survey file<span style="font-family: inherit;">, and an assay file. E</span></span></span>ach has a unique key in the first column that matches across files<span style="font-family: inherit;"> (the drill hole key). On the SQL side I have ensured that there are no orphan key rows in any of the three files and that <span style="font-family: inherit;">all three are sorted on the key.</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><br />
<br /></div>
<div style="text-align: left;">
</div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;">I present the sanitized ouput here first - it will give some context to the domain spec<span style="font-family: inherit;">ific parts of the code:</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div style="text-align: left;">
<span style="font-family: "courier new" , "courier" , monospace;"><b><br /></b></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: "courier new" , "courier" , monospace;"><b>XXXXX,XXXXXX.XXXX,XXXXXXX.XXXX,XXXX.XXXX,0.0000,0.0000,26.4529<br />XXXXX,0.0000,1.1925,1.1925,283.5688,-13.5310 <br />XXXXX,1.1925,4.2760,3.0836,284.6224,1.9328 SURVEYS<br />XXXXX,4.2760,6.3799,2.1039,280.2829,-3.1334 GO<br />XXXXX,6.3799,9.7024,3.3225,282.5794,2.3632 HERE<br />XXXXX,9.7024,11.8701,2.1677,285.4406,-1.1631 AFTER<br />XXXXX,11.8701,13.6920,1.8219,275.9462,-5.0698 COLLAR<br />XXXXX,13.6920,17.1199,3.4279,285.4561,1.9560 LOCATION<br />XXXXX,17.1199,19.6944,2.5746,279.2318,-0.7344<br />XXXXX,19.6944,22.5857,2.8913,282.1947,4.3241<br />XXXXX,22.5857,24.1879,1.6022,283.8367,-1.7525<br />XXXXX,24.1879,26.4529,2.2650,287.3820,13.4805<br />XXXXX <----- LEGACY DRILLHOLE NUMBER<br />XXXXX,X.XXXX,X.XXXX,X.XXXX,X.XX,X.XX,X.XX, etc.<br />XXXXX,X.XXXX,X.XXXX,X.XXXX,X.XX,X.XX,X.XX, etc.<br />XXXXX,X.XXXX,X.XXXX,X.XXXX,X.XX,X.XX,X.XX, etc. ASSAYS<br />XXXXX,X.XXXX,X.XXXX,X.XXXX,X.XX,X.XX,X.XX, etc. GO<br />XXXXX,X.XXXX,XX.XXXX,X.XXXX,X.XX,X.XX,X.XX, etc. HERE<br />XXXXX,XX.XXXX,XX.XXXX,X.XXXX,X.XX,X.XX,X.XX,XX.XX, etc.<br />XXXXX,XX.XXXX,XX.XXXX,X.XXXX,X.XX,X.XX,X.XX,XX.XX, etc.<br />XXXXX,XX.XXXX,XX.XXXX,X.XXXX,X.XX,X.XX,X.XX,XX.XX, etc.<br />XXXXX,XX.XXXX,XX.XXXX,X.XXXX,X.XX,X.XX,X.XX,XX.XX, etc.<br />XXXXX,XX.XXXX,XX.XXXX,X.XXXX,X.XX,X.XX,X.XX,XX.XX, etc.<br />XXXXX,XX.XXXX,XX.XXXX,X.XXXX,X.XX,X.XX,X.XX,XX.XX, etc.<br /> <----- BLANK LINE<br />XXXXXX,XXXXXX.XXXX,XXXXXXX.XXXX,XXXX.XXXX,0.0000,0.0000,23.5411<br />XXXXXX,0.0000,2.5781,2.5781,135.0157,2.3341<br />XXXXXX,2.5781,5.0351,2.4570,137.1873,5.5353<br />XXXXXX,5.0351,7.3706,2.3354,135.2276,7.7020<br />XXXXXX,7.3706,9.9168,2.5462,136.4253,6.4493<br /> .<br /> .<br /> .<br /> .<br /> .<br /> .<br /> .<br /> etc.</b></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><br />
<br /></div>
<div style="text-align: left;">
</div>
<div style="text-align: left;">
<span style="font-family: inherit;">And the code (sorry about the size - it got messier than I would have hoped):</span><br />
<br /></div>
<div style="text-align: left;">
</div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: "courier new" , "courier" , monospace;"><b><span style="font-size: x-small;">#!C:\Python35\python<br /><br />"""<br />Parse collar, survey, and assay dumps for<br />trenches from vendor drill hole RDBMS.<br /><br />Write specially formatted data file for<br />consumption by old vendor FORTRAN<br />routine 201.<br />"""<br /><br />import csv<br />from collections import namedtuple<br />from collections import OrderedDict<br /><br />COLLAR = './data/collar.csv'<br />SURVEY = './data/survey.csv'<br />ASSAYS = './data/assays.csv'<br />DAT201 = './data/TR.dat'<br /><br /># collar (ssit) fields<br />ID = 'drillholeid'<br />NAME = 'drillholename'<br />DATE = 'drillholedate'<br />LEGACY = 'drillholehistoricname'<br />X = 'collarx'<br />Y = 'collary'<br />Z = 'collarz'<br />AZ = 'azimuth'<br />DIP = 'dip'<br />LEN = 'drillholelength'<br /><br />COLLARFIELDS = [ID, NAME, DATE, LEGACY, X, Y, Z,<br /> AZ, DIP, LEN]<br /><br /># survey fields<br />FROM = 'fromx'<br />TO = 'depthto'<br />SAMPLEN = 'surveylength'<br />AZ = 'azimuth'<br />DIP = 'dip'<br /><br />SURVEYFIELDS = [ID, NAME, DATE, LEGACY, FROM, TO,<br /> SAMPLEN, AZ, DIP]<br /><br /># assay fields<br />AFROM = 'assayfrom'<br />ATO = 'assayto'<br />AI = 'assayinterval'<br />ASSAY1 = 'assay1'<br />ASSAY2 = 'assay2'<br />ASSAY3 = 'assay3'<br />ASSAY4 = 'assay4'<br />ASSAY5 = 'assay5'<br />ASSAY6 = 'assay6'<br />ASSAY7 = 'assay7'<br />ASSAY8 = 'assay8'<br /><br />ASSAYFIELDS = [ID, NAME, LEGACY, AFROM, ATO, AI, ASSAY1,<br /> ASSAY2, ASSAY3, ASSAY4, ASSAY5, ASSAY6, ASSAY7, ASSAY8]<br /><br />ASSAYFORMAT = '.2f'<br />SURVEYFORMAT = '.4f'<br /><br />COMMA = ','<br /><br /># Output for 201 file format.<br /># Collars.<br />COLOUTPUTCOLS = [X, Y, Z, AZ, DIP, LEN]<br />COLFMTOUTPUT = [(attribx, SURVEYFORMAT) for attribx in COLOUTPUTCOLS]<br /># Surveys.<br />SURVOUTPUTCOLS = [FROM, TO, SAMPLEN, AZ, DIP]<br />SURVFMTOUTPUT = [(attribx, SURVEYFORMAT) for attribx in SURVOUTPUTCOLS]<br /># Assays.<br />ASSYOUTPUTCOLS = [AFROM, ATO, AI, ASSAY1, ASSAY2, ASSAY3, ASSAY4, ASSAY5,<br /> ASSAY6, ASSAY7, ASSAY8]<br />ASSYOUTPUTFMTS = 3 * [SURVEYFORMAT] + 8 * [ASSAYFORMAT]<br /># Have to use this repeatedly - hence list.<br />ASSYFMTOUTPUT = list(zip(ASSYOUTPUTCOLS, ASSYOUTPUTFMTS))<br /><br />RETCHAR = '\n'<br /><br /># For tracking which dataset we're<br /># dealing with.<br />SURVEYSUBDATA = 'survey'<br />ASSAYSUBDATA = 'assay'<br /><br /># For survey/assay dictionary.<br />COR = 'coroutine'<br />FMT = 'format'<br />LAST = 'lastvalue'<br />END = 'end'<br /><br />INFOMESSAGE = 'Now doing hole number {0} . . .'<br /><br />def makecsvdatagenerator(csvrdr, ntname, ntfields):<br /> """<br /> Returns a generator that yields csv<br /> row records as named tuple objects.<br /><br /> csvrdr is the csv.reader object. <br /><br /> ntname is the name given to the<br /> collections.namedtuple object.<br /><br /> ntfields is the list of field names<br /> for the collections.namedtuple object. <br /> """<br /> namedtup = namedtuple(ntname, ntfields)<br /> return (namedtup(*linex) for linex in csvrdr)<br /><br />def formatassay(numstring, formatx):<br /> """<br /> Returns a string representing a float<br /> that typically is in 0.00 format, but<br /> other float formats can be applied.<br /><br /> numstring is a string representing a float.<br /><br /> formatx is the desired format (Python 3 format string).<br /> """<br /> return(format(float(numstring), formatx))<br /><br />def getnumericstrings(record, formats):<br /> """<br /> Returns list of strings.<br /><br /> record is a collections.namedtuple instance.<br /><br /> formats is a list of two-lists of namedtuple<br /> attributes and numeric string formats to be<br /> applied to each attribute's value.<br /> """<br /> return [formatassay(record.__getattribute__(pairx[0]),<br /> pairx[1])<br /> for pairx in formats]<br /><br />def coroutinex(subgenerator):<br /> """<br /> Generator function.<br /> <br /> Consumes key value and yields<br /> two tuple of (<boolean>,<br /> next(subgenerator)) in response.<br /> boolean value indicates<br /> whether key matches first<br /> value of subgenerator namedtuple.<br /><br /> subgenerator is a generator of<br /> namedtuples.<br /><br /> Returns a generator.<br /> """<br /> while True:<br /> keyx = yield<br /> subdatatop = next(subgenerator)<br /> if subdatatop.drillholeid == keyx:<br /> yield (True, subdatatop)<br /> for subdataloop in subgenerator:<br /> if subdataloop.drillholeid == keyx:<br /> yield (True, subdataloop)<br /> else:<br /> yield (False, subdataloop)<br /> break<br /> # Case where only one interval in<br /> # drill hole.<br /> else:<br /> yield (False, subdatatop)<br /><br />def formatdataline(record, formats):<br /> """<br /> Prepare record as a line<br /> of text for write to file.<br /><br /> record is a collections.namedtuple<br /> object.<br /><br /> formats is a list of two tuples of<br /> namedtuple attributes and numeric<br /> string formats.<br /><br /> Returns string.<br /> """<br /> recordline = [record.drillholehistoricname]<br /> recordline.extend(getnumericstrings(record,<br /> formats))<br /> return COMMA.join(recordline) + RETCHAR<br /><br />def dealwithsend(subgen, sendval):<br /> """<br /> Helper function to clean up code.<br /> Deals with initial receipt of<br /> None value upon send() and<br /> re-sends value.<br /><br /> Sends value sendval to<br /> generator/coroutine subgen.<br /><br /> Returns two tuple of (<boolean>,<br /> <collections.namedtuple>).<br /> """<br /> retval = subgen.send(sendval)<br /> if retval is None:<br /> retval = subgen.send(sendval)<br /> return retval<br /><br />def dealwithyieldrecord(survassay, subdata):<br /> """<br /> Helper function to clean up code.<br /><br /> Formats values for write to file.<br /><br /> survassay is a dictionary of values.<br /><br /> subdata is the dictionary key that<br /> tells which data is being handled<br /> (survey or assay).<br /> """<br /> return formatdataline(survassay[subdata][LAST][1],<br /> survassay[subdata][FMT])<br /><br />def cyclecollars(collargen,<br /> survassay):<br /> """<br /> Generator function that yields<br /> data (strings) for write to a<br /> a specially formatted drill hole<br /> file.<br /><br /> This is the top level generator<br /> for working the merging of <br /> drillhole data (collars, surveys,<br /> assays).<br /><br /> survassay is a collections.OrderedDict<br /> object that references the respective<br /> survey and assay generators and holds<br /> information for tracking which subset<br /> of data (surveys or assays) are being<br /> worked.<br /> """<br /> for record in collargen:<br /> keyx = record.drillholeid<br /> label = record.drillholehistoricname<br /> survassay[SURVEYSUBDATA][END] = label + RETCHAR<br /> print(INFOMESSAGE.format(label))<br /> yield formatdataline(record, COLFMTOUTPUT)<br /> for subdata in survassay:<br /> fmt = survassay[subdata][FMT]<br /> if survassay[subdata][LAST]:<br /> yield dealwithyieldrecord(survassay, subdata)<br /> subvalue = dealwithsend(survassay[subdata][COR], keyx)<br /> # Case where only one interval.<br /> if not subvalue[0]:<br /> survassay[subdata][LAST] = subvalue<br /> yield survassay[subdata][END]<br /> continue<br /> yield formatdataline(subvalue[1], fmt)<br /> for submessage in survassay[subdata][COR]:<br /> # End of iteration.<br /> if submessage is None:<br /> yield survassay[subdata][END]<br /> break<br /> if submessage[0]:<br /> yield formatdataline(submessage[1], fmt)<br /> else:<br /> survassay[subdata][LAST] = submessage<br /> yield survassay[subdata][END]<br /> break<br /><br />def main():<br /> """<br /> Parse csv dumps from SQL and write<br /> drillhole data fields for import<br /> to old vendor FORTRAN based binary<br /> files.<br /><br /> Side effect function.<br /> """<br /> with open(COLLAR, 'r') as colx:<br /> colcsv = csv.reader(colx)<br /> collargen = makecsvdatagenerator(colcsv,<br /> 'collars',<br /> COLLARFIELDS)<br /> with open(SURVEY, 'r') as svgx:<br /> survcsv = csv.reader(svgx)<br /> survgen = makecsvdatagenerator(survcsv,<br /> 'surveys',<br /> SURVEYFIELDS)<br /> surveycoroutinex = coroutinex(survgen)<br /> with open(ASSAYS, 'r') as assx:<br /> assycsv = csv.reader(assx)<br /> assygen = makecsvdatagenerator(assycsv,<br /> 'assays',<br /> ASSAYFIELDS)<br /> assaycoroutinex = coroutinex(assygen)<br /> with open(DAT201, 'w') as d201:<br /> # Get sub generators/coroutines initialized.<br /> surveycoroutinex.send(None)<br /> assaycoroutinex.send(None)<br /> surveyassay = OrderedDict()<br /> surveyassay[SURVEYSUBDATA] = {COR:surveycoroutinex,<br /> FMT:SURVFMTOUTPUT,<br /> LAST:None,<br /> END:None}<br /> surveyassay[ASSAYSUBDATA] = {COR:assaycoroutinex,<br /> FMT:ASSYFMTOUTPUT,<br /> LAST:None,<br /> END:RETCHAR}<br /> colgenx = cyclecollars(collargen,<br /> surveyassay)<br /> for linex in colgenx:<br /> d201.write(linex)<br /> print('Done')<br /><br />if __name__ == '__main__':<br /> main()</span> </b></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><br />
<br /></div>
<div style="text-align: left;">
</div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-size: small;">The bad news: this was more difficult with a real world dataset than I anticipated. Beazley's admonition was an apt one.</span></span></span><br />
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-size: small;"> </span></span></span></div>
<div style="text-align: left;">
</div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-size: small;">The good news: it does perform better than my previous brute force implementations. From the standpoint of iterating through datasets and not wasting resources (even with the polling or interrupting or whatever facilitates the generator communication closer to the metal), this is a better implementation. Also, I learned a bit more about the "yield" keyword.<br /><br />Thanks for stopping by. </span></span><b> </b></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><br /></span> </span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><br /></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><br /></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><span style="font-family: inherit;"><br /></span></span></span></span> </span> </span></span></span></span></span></span></span></span></span></span></span></span></span><b><i><br /></i></b></div>
Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com3tag:blogger.com,1999:blog-524230429673765509.post-86522529209631808132016-04-18T09:52:00.000-07:002016-04-18T09:52:26.607-07:007-Zip-JBinding API with jython on WindowsI have a set of multi-GB Windows folders that I need to archive in 7-zip format each month. I'd prefer not to use the mouse to compress the folders "manually." Also, I didn't want to use the command line with the subprocess module like <a href="http://pyright.blogspot.com/2014/10/subprocesspopen-or-abusing-home-grown.html" target="_blank">I have with some other programs.</a> Ideally, I wanted to control 7zip programmatically. The <a href="http://sevenzipjbind.sourceforge.net/" target="_blank">7-Zip-JBinding libraries</a> offered a means to do this from jython.<br />
<br />
7-Zip-JBinding is written using java Interfaces that are structured pretty specifically. I did not venture too far away from the examples given in the 7-Zip-JBinding documentation. I smithed two modules for my own purposes, compressing and uncompressing, and present them (java code) below. The decompression one has a separate method for retrieving paths of the compressed files. This is not efficient, but for what I need to do, and for the limitations of the library and the approach, it works out for the best.<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">import java.io.IOException;<br />import java.io.RandomAccessFile;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">import net.sf.sevenzipjbinding.IOutCreateArchive7z;<br />import net.sf.sevenzipjbinding.IOutCreateCallback;<br />import net.sf.sevenzipjbinding.IOutItem7z;<br />import net.sf.sevenzipjbinding.ISequentialInStream;<br />import net.sf.sevenzipjbinding.SevenZip;<br />import net.sf.sevenzipjbinding.SevenZipException;<br />import net.sf.sevenzipjbinding.impl.OutItemFactory;<br />import net.sf.sevenzipjbinding.impl.RandomAccessFileOutStream;<br />import net.sf.sevenzipjbinding.util.ByteArrayStream;</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">/* Off StackOverflow - works for getting<br /> * file content/bytes from path */<br />import java.nio.file.Files;<br />import java.nio.file.Paths;<br />import java.nio.file.Path;</span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">public class SevenZipThing {</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> private static final String RETCHAR = "\n";<br /> private static final String INTFMT = "%,d";<br /> private static final String BYTESTOCOMPRESS = " bytes total to compress\n";<br /> private static final String ERROCCURS = "Error occurs: ";<br /> private static final String COMPRESSFILE = "\nCompressing file ";<br /> private static final String RW = "rw";<br /> private static final int LVL = 5;<br /> private static final String SEVZERR = "7z-Error occurs:";<br /> private static final String ERRCLOSING = "Error closing archive: ";<br /> private static final String ERRCLOSINGFLE = "Error closing file: ";<br /> private static final String SUCCESS = "\nCompression operation succeeded\n";</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> private String filename;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> /* String[] array conversion from jython list<br /> * implicit and poses no problems (JKD7) */<br /> private String[] pathsx;</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> public SevenZipThing(String filename, String[] pathsx) {<br /> this.filename = filename;<br /> this.pathsx = pathsx;<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> /**<br /> * The callback provides information about archive items.<br /> */<br /> /** </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /> * I copied this straight from the sevenZipJBinding's author's<br /> * code - but I haven't put much in to deal with messaging<br /> * or error handling<br /> * */<br /> private final class MyCreateCallback <br /> implements IOutCreateCallback<IOutItem7z> {</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> public void setOperationResult(boolean operationResultOk)<br /> throws SevenZipException {<br /> // Track each operation result here<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> public void setTotal(long total) throws SevenZipException {<br /> // Track operation progress here<br /> <br /> System.out.print(RETCHAR + String.format(INTFMT, total) +<br /> BYTESTOCOMPRESS);<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> public void setCompleted(long complete) throws SevenZipException {<br /> // Track operation progress here<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> public IOutItem7z getItemInformation(int index,<br /> OutItemFactory<IOutItem7z> outItemFactory) {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> IOutItem7z item = outItemFactory.createOutItem();</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Path path = Paths.get(pathsx[index]);<br /> item.setPropertyPath(pathsx[index]);<br /> try {<br /> // Java arrays are limited to 2 ** 31 items - small.<br /> byte[] data = Files.readAllBytes(path);<br /> item.setDataSize((long) data.length);<br /> return item;<br /> // XXX - I could do a lot better than this (error handling).<br /> } catch (Exception e) {<br /> System.err.println(ERROCCURS + e);<br /> }<br /> return null;<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> public ISequentialInStream getStream(int i)<br /> throws SevenZipException {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Path path = Paths.get(pathsx[i]);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> try {<br /> byte[] data = Files.readAllBytes(path);<br /> System.out.println(COMPRESSFILE + path);<br /> return new ByteArrayStream(data, true);<br /> } catch (Exception e) {<br /> System.err.println(ERROCCURS + e);<br /> }<br /> return null;<br /> }<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> public void compress() {<br /> <br /> /* Mostly copied from sevenZipJBinding's author's code -<br /> * I made the compress method public to work from jython.<br /> * Also, I deal with all of the file listing in jython<br /> * and just pass a list to this class. */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> boolean success = false;<br /> RandomAccessFile raf = null;<br /> IOutCreateArchive7z outArchive = null;<br /> try {<br /> raf = new RandomAccessFile(filename, RW);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> // Open out-archive object<br /> outArchive = SevenZip.openOutArchive7z();</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> // Configure archive<br /> outArchive.setLevel(LVL);<br /> outArchive.setSolid(true);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> // All available processors.<br /> outArchive.setThreadCount(0);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> // Create archive<br /> outArchive.createArchive(new RandomAccessFileOutStream(raf),<br /> pathsx.length, new MyCreateCallback());<br /> success = true;<br /> } catch (SevenZipException e) {<br /> System.err.println(SEVZERR);<br /> // Get more information using extended method<br /> e.printStackTraceExtended();<br /> } catch (Exception e) {<br /> System.err.println(ERROCCURS + e);<br /> } finally {<br /> if (outArchive != null) {<br /> try {<br /> outArchive.close();<br /> } catch (IOException e) {<br /> System.err.println(ERRCLOSING + e);<br /> success = false;<br /> }<br /> }<br /> if (raf != null) {<br /> try {<br /> raf.close();<br /> } catch (IOException e) {<br /> System.err.println(ERRCLOSINGFLE + e);<br /> success = false;<br /> }<br /> }<br /> }<br /> if (success) {<br /> System.out.println(SUCCESS);<br /> }<br /> }<br />}</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new";">import java.io.IOException;<br />import java.io.RandomAccessFile;<br />import java.io.File;<br />import java.io.OutputStream;<br />import java.io.FileOutputStream;<br />import java.io.FileNotFoundException;</span><br />
<span style="font-family: "courier new";">import java.util.Arrays;<br />import java.util.ArrayList;</span><br />
<span style="font-family: "courier new";">import net.sf.sevenzipjbinding.IInArchive;<br />import net.sf.sevenzipjbinding.PropID;<br />import net.sf.sevenzipjbinding.SevenZip;<br />import net.sf.sevenzipjbinding.SevenZipException;<br />import net.sf.sevenzipjbinding.impl.RandomAccessFileInStream;<br />import net.sf.sevenzipjbinding.IArchiveExtractCallback;<br />import net.sf.sevenzipjbinding.ExtractOperationResult;<br />import net.sf.sevenzipjbinding.ExtractAskMode;<br />import net.sf.sevenzipjbinding.ISequentialOutStream;</span><br />
<span style="font-family: "courier new";">/* 7z archive format */<br />/* SEVEN_ZIP is the one I want */<br />import net.sf.sevenzipjbinding.ArchiveFormat;</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";">public class SevenZipThingExtract {</span><br />
<span style="font-family: "courier new";"> private String filename;<br /> private String extractdirectory;<br /> private ArrayList<String> foldersx = null;<br /> private boolean subdirectory = false;</span><br />
<span style="font-family: "courier new";"> private static final String ERROPENINGFLE = "Error opening file: ";<br /> private static final String ERRWRITINGFLE = "Error writing to file: ";<br /> private static final String EXTERR = "Extraction error";<br /> private static final String INFOFMT = "%9X | %10s | %s";<br /> private static final String RETCHAR = "\n";<br /> private static final String INTFMT = "%,d";<br /> private static final String BYTESTOEXTRACT = " bytes total to extract\n";<br /> private static final String RW = "rw";<br /> private static final String BACKSLASH = "\\";<br /> private static final String SEVZERR = "7z-Error occurs:";<br /> private static final String ERROCCURS = "Error occurs: ";<br /> private static final String ERRCLOSING = "Error closing archive: ";<br /> private static final String ERRCLOSINGFLE = "Error closing file: ";</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> public SevenZipThingExtract(String filename, String extractdirectory,<br /> boolean subdirectory) {<br /> this.filename = filename;<br /> foldersx = new ArrayList<String>();<br /> this.foldersx = foldersx;<br /> this.extractdirectory = extractdirectory;<br /> this.subdirectory = subdirectory;<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> private final class MyExtractCallback <br /> implements IArchiveExtractCallback {</span><br />
<span style="font-family: "courier new";"> // Copied mostly from example.<br /> private int hash = 0;<br /> private int size = 0;<br /> private int index;<br /> private boolean skipExtraction;<br /> private IInArchive inArchive;</span><br />
<span style="font-family: "courier new";"> private OutputStream outputStream;<br /> private File file;</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> public MyExtractCallback(IInArchive inArchive) {<br /> this.inArchive = inArchive;<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> @Override<br /> public ISequentialOutStream getStream(int index,<br /> ExtractAskMode extractAskMode)<br /> throws SevenZipException {</span><br />
<span style="font-family: "courier new";"><br /> this.index = index;<br /> // I'm not skipping anything.<br /> skipExtraction = (Boolean) false;</span><br />
<span style="font-family: "courier new";"> String path = (String) inArchive.getProperty(index, PropID.PATH);<br /> // Try preprending extractdirectory.<br /> if (subdirectory) {<br /> path = extractdirectory + BACKSLASH + path.substring(2);<br /> } else {<br /> path = extractdirectory + BACKSLASH + path;<br /> }<br /> file = new File(path);</span><br />
<span style="font-family: "courier new";"> try {<br /> outputStream = new FileOutputStream(file);<br /> } catch (FileNotFoundException e) {<br /> throw new SevenZipException(ERROPENINGFLE<br /> + file.getAbsolutePath(), e);<br /> }<br /> return new ISequentialOutStream() {<br /> public int write(byte[] data) throws SevenZipException {<br /> try {<br /> outputStream.write(data);<br /> } catch (IOException e) {<br /> throw new SevenZipException(ERRWRITINGFLE<br /> + file.getAbsolutePath());<br /> }<br /> return data.length; // Return amount of consumed data<br /> }<br /> };<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> public void prepareOperation(ExtractAskMode extractAskMode)<br /> throws SevenZipException {<br /> }</span><br />
<span style="font-family: "courier new";"> public void setOperationResult(ExtractOperationResult extractOperationResult)<br /> throws SevenZipException {<br /> // Track each operation result here<br /> if (extractOperationResult != ExtractOperationResult.OK) {<br /> System.err.println(EXTERR);<br /> } else {<br /> System.out.println(String.format(INFOFMT, hash, size,// <br /> inArchive.getProperty(index, PropID.PATH)));<br /> hash = 0;<br /> size = 0;<br /> }<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> public void setTotal(long total) throws SevenZipException {<br /> System.out.print(RETCHAR + String.format(INTFMT, total) +<br /> BYTESTOEXTRACT);<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> public void setCompleted(long complete) throws SevenZipException {<br /> // Track operation progress here<br /> }<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> private final class MyGetPathsCallback <br /> implements IArchiveExtractCallback {</span><br />
<span style="font-family: "courier new";"> // Copied mostly from example.<br /> private int hash = 0;<br /> private int size = 0;<br /> private int index;<br /> private boolean skipExtraction;<br /> private IInArchive inArchive;</span><br />
<span style="font-family: "courier new";"> public MyGetPathsCallback(IInArchive inArchive) {<br /> this.inArchive = inArchive;<br /> }</span><br />
<span style="font-family: "courier new";"> public ISequentialOutStream getStream(int index,<br /> ExtractAskMode extractAskMode)<br /> throws SevenZipException {<br /> this.index = index;<br /> // I'm not skipping anything.<br /> skipExtraction = (Boolean) false;</span><br />
<span style="font-family: "courier new";"> String path = (String) inArchive.getProperty(index,<br /> PropID.PATH);<br /> foldersx.add(path);</span><br />
<span style="font-family: "courier new";"> return new ISequentialOutStream() {<br /> public int write(byte[] data) throws SevenZipException {<br /> hash ^= Arrays.hashCode(data);<br /> size += data.length;<br /> // Return amount of processed data<br /> return data.length;<br /> }<br /> };<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> public void prepareOperation(ExtractAskMode extractAskMode)<br /> throws SevenZipException {<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> public void setOperationResult(ExtractOperationResult extractOperationResult)<br /> throws SevenZipException {<br /> // Track each operation result here<br /> if (extractOperationResult != ExtractOperationResult.OK) {<br /> System.err.println(EXTERR);<br /> } else {<br /> System.out.println(String.format(INFOFMT, hash, size,<br /> inArchive.getProperty(index, PropID.PATH)));<br /> hash = 0;<br /> size = 0;<br /> }<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> public void setTotal(long total) throws SevenZipException {<br /> System.out.print(RETCHAR + String.format(INTFMT, total) +<br /> BYTESTOEXTRACT);<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> public void setCompleted(long complete) throws SevenZipException {<br /> // Track operation progress here<br /> }<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> public void extractfiles() {<br /> <br /> boolean success = false;<br /> RandomAccessFile raf = null;<br /> IInArchive inArchive = null;<br /> try {<br /> raf = new RandomAccessFile(filename, RW);</span><br />
<span style="font-family: "courier new";"> inArchive = SevenZip.openInArchive(ArchiveFormat.SEVEN_ZIP, <br /> new RandomAccessFileInStream(raf));</span><br />
<span style="font-family: "courier new";"> int itemCount = inArchive.getNumberOfItems();<br /> <br /> // From StackOverflow - could use IntStream,<br /> // but that's Java 1.8 (using 1.7).<br /> int[] fileindices = new int[itemCount];<br /> for(int k = 0; k < fileindices.length; k++)<br /> fileindices[k] = k;<br /> inArchive.extract(fileindices, false,<br /> new MyExtractCallback(inArchive));<br /> } catch (SevenZipException e) {<br /> System.err.println(SEVZERR);<br /> // Get more information using extended method<br /> e.printStackTraceExtended();<br /> } catch (Exception e) {<br /> System.err.println(ERROCCURS + e);<br /> } finally {<br /> if (inArchive != null) {<br /> try {<br /> inArchive.close();<br /> } catch (IOException e) {<br /> System.err.println(ERRCLOSING + e);<br /> }<br /> }<br /> if (raf != null) {<br /> try {<br /> raf.close();<br /> } catch (IOException e) {<br /> System.err.println(ERRCLOSINGFLE + e);<br /> }<br /> }<br /> }<br /> }</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new";"> public ArrayList<String> getfolders() {<br /> <br /> boolean success = false;<br /> RandomAccessFile raf = null;<br /> IInArchive inArchive = null;</span><br />
<span style="font-family: "courier new";"> try {<br /> raf = new RandomAccessFile(filename, RW);</span><br />
<span style="font-family: "courier new";"> inArchive = SevenZip.openInArchive(ArchiveFormat.SEVEN_ZIP, <br /> new RandomAccessFileInStream(raf));</span><br />
<span style="font-family: "courier new";"> int itemCount = inArchive.getNumberOfItems();<br /> <br /> // From StackOverflow - could use IntStream,<br /> // but that's Java 1.8 (using 1.7).<br /> int[] fileindices = new int[itemCount];<br /> for(int k = 0; k < fileindices.length; k++)<br /> fileindices[k] = k;<br /> inArchive.extract(fileindices, false,<br /> new MyGetPathsCallback(inArchive));<br /> } catch (SevenZipException e) {<br /> System.err.println(SEVZERR);<br /> // Get more information using extended method<br /> e.printStackTraceExtended();<br /> } catch (Exception e) {<br /> System.err.println(ERROCCURS + e);<br /> } finally {<br /> if (inArchive != null) {<br /> try {<br /> inArchive.close();<br /> } catch (IOException e) {<br /> System.err.println(ERRCLOSING + e);<br /> }<br /> }<br /> if (raf != null) {<br /> try {<br /> raf.close();<br /> } catch (IOException e) {<br /> System.err.println(ERRCLOSINGFLE + e);<br /> }<br /> }<br /> }<br /> return foldersx;<br /> }<br />}</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: inherit;">The method getfolders in the SevenZipThingExtract class is the extra method to get the list of folders. As noted in the jython code below, the limitations on the number of bytes and files to be compressed necessitates splitting larger files into chunks. Also, for my specific use case, I need to extract files to a specific folder and set of subfolders. My methodology is outlined in the comments in the jython code. The good news: if I get run over by a bus and the uncompression part of the program gets lost, people will be able to get the files back with some effort. The bad news: they will be cursing my headstone. You do the best you can.<br /><br />The three jython modules - the first one, folderstozip.py is just constants:</span><br />
<span style="font-family: inherit;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;">#!java -jar C:\jython-2.7.0\jython.jar</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># folderstozip.py</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">"""<br />Constants used in compression and<br />decompression.<br />"""</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">FRONTSLASH = '/'<br />BACKSLASH = '\\'<br />EMPTY = ''<br />SAMEFOLDER = './'<br />SAMEFOLDERWIN = u'.\\'</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">SPLITFILETRACKER = 'SPLITFILETRACKER.csv'</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">SPLITFILE = '{0:s}.{1:s}'</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">UCOMMA = u','</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># 3rd party sevenZipJBindings library.<br />PATH7ZJB = 'C:/MSPROJECTS/EOMReconciliation/2016/03March'<br />PATH7ZJB += '/Backup/sevenzipjbinding/lib/sevenzipjbinding.jar'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># OS specific 3rd party sevenZipJBindings library.<br />PATH7ZJBOSSPEC = r'C:/MSPROJECTS/EOMReconciliation/2016/03March'<br />PATH7ZJBOSSPEC += '/Backup/sevenzipjbinding/lib/sevenzipjbinding-Windows-amd64.jar'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">PROGFOLDER = 'C:/MSPROJECTS/EOMReconciliation/2016/03March/Backup'<br />PROGFOLDER += FRONTSLASH</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># Informational messages.<br />WROTEFILE = 'Wrote file {:s}\n'<br />SPLITFILEMSG = 'Have now split {0:,d} bytes of file {1:s} into {2:d} {3:,d} chunks.\n'<br />DONESPLITTING = '\nDone splitting file'<br />FILESAFTERSPLIT = '\n{:d} files after split'</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">COMPRESSING = '\nCompressing file {:s} . . .\n'<br />DELETING = '\nDeleting file {:s} . . .\n'<br />DELETINGDIR = '\nNow deleting {:s} . . .\n'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># Room for 9999 file names.<br />UNIQUEX = '{0:05d}'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># XXX - multiple file archives limited to<br /># 10KB - reason unknown - crashes jvm<br /># with IInStream interface class not <br /># found.<br /># XXX - choked on 8700 bytes - try dropping<br /># this from 9500 to 8500.<br />MULTFILELIMIT = 8500<br />HALFLIMIT = MULTFILELIMIT/2</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># About 50 splits for a 3GB file.<br />CHUNK = 2 ** 26</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># Path plus split number.<br />FILEN = r'{0:s}.{1:03d}'</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># Path plus basefilename.<br />FILEB = r'{0:s}{1:s}'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># Read/Write constants.<br />RB = 'rb'<br />WB = 'wb'<br />W = 'w'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># Filename plus split number.<br />ARCHIVEX = '{0:s}/{1:s}.7z'</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /># multifile archive<br />MULTARCHIVEX = '{0:s}/archive{1:03d}.7z'<br />MULTFILES = '. . . multiple files'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># File categories.<br /># Size less than HALFLIMIT.<br />SMALL = 'small'<br /># Size greater than or equal to HALFLIMIT but<br /># less than or equal to CHUNK.<br />MEDIUM = 'medium'<br /># Larger than CHUNK.<br />LARGE = 'large'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">BASEPATH = 'basepath'</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br />FILES = 'files'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># XXX - this folder has recognizable<br /># folder names within your domain<br /># space - mine are open pit mining<br /># area names.<br />BASEDIRS = ['Pit-1', 'Pit-2', 'Pit-3']</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">#!java -jar C:/jython-2.7.0/jython.jar</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># sevenzipper.py</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">"""<br />Use java 3rd party 7-zip compression<br />library (sevenZipJBindings) from<br />jython to 7zip up MineSight project<br />files.<br />"""</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">import folderstozip as fld</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># Need to adjust path to get necessary jar imports.<br />import sys</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># Need for os.path<br />import os</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># Original path of file plus split number.<br />SPLITFILERECORD = '{0:s},{1:03d}'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">sys.path.append(fld.PATH7ZJB)<br />sys.path.append(fld.PATH7ZJBOSSPEC)</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># java 7zip library<br />import SevenZipThing as z7thing</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># For copying files to program<br /># directory and deleting the old<br /># ones where necessary.<br />import shutil</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># For unique archive names.<br />import itertools</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br />COUNTERX = itertools.count(0, 1)</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def splitfile(originalfilepath, splitfilestrackerfile):<br /> """<br /> Split file at (string) originalfilepath<br /> into fld.CHUNK sized chunks and indicate<br /> sequence by number in new split file<br /> name.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Return generator of relative file paths<br /> inside project folder.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> originalfilepath is the path of the<br /> file that needs to be split into parts.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> splitfilestrackerfile is an open file<br /> object used for tracking file splits<br /> for later retrieval.<br /> """<br /> sizeoffile = os.path.getsize(originalfilepath)<br /> chunks = sizeoffile/fld.CHUNK + 1<br /> # Counter.<br /> i = 1<br /> with open(originalfilepath, fld.RB) as f:<br /> while i < chunks + 1:<br /> with open(fld.FILEN.format(originalfilepath, i), fld.WB) as f2:<br /> f2.write(f.read(fld.CHUNK))<br /> print(fld.WROTEFILE.format(fld.FILEN.format(originalfilepath, i)))<br /> print(fld.SPLITFILEMSG.format(f.tell(), originalfilepath, i, fld.CHUNK))<br /> print >> splitfilestrackerfile, (SPLITFILERECORD.format(originalfilepath, i))<br /> i += 1<br /> print(fld.DONESPLITTING)<br /> print(fld.FILESAFTERSPLIT.format(i - 1))<br /> return (fld.FILEN.format(originalfilepath, x) for x in xrange(1, i))</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def movefiles(movefilesx, intermediatepath):<br /> """<br /> Move files from MineSight project directory<br /> to program directory.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Return a list of base file names for the<br /> moved files.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> movefilesx is a generator of file paths.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> intermediatepath is a string relative path<br /> between the program folder and the sub-folder<br /> of the MineSight directory (_msresources/06SOLIDS,<br /> for example).<br /> """<br /> # Move files to that folder.<br /> movedfiles = []<br /> for pathx in movefilesx:<br /> shutil.move(pathx, fld.PROGFOLDER + intermediatepath +<br /> os.path.basename(pathx))<br /> movedfiles.append(intermediatepath + os.path.basename(pathx))<br /> return movedfiles</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def copyfiles(copyfilesx, intermediatepath):<br /> """<br /> Copy files from MineSight project directory<br /> to program directory.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Return a list of base file names for the<br /> copied files.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> copyfilesx is a generator of file paths.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> intermediatepath is a string relative path<br /> between the program folder and the sub-folder<br /> of the MineSight directory (_msresources/06SOLIDS,<br /> for example).<br /> """<br /> # Copy files to that folder.<br /> copiedfiles = []<br /> for pathx in copyfilesx:<br /> shutil.copyfile(pathx, fld.PROGFOLDER + intermediatepath +<br /> os.path.basename(pathx))<br /> copiedfiles.append(intermediatepath + os.path.basename(pathx))<br /> return copiedfiles</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def compressfilessingle(filestocompress, prefix, basedir):<br /> """<br /> Compresses files into an archive.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> This is for larger files that take up<br /> an entire archive (7z file).</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> filestocompress is a list of paths of<br /> files to be compressed. These files<br /> reside inside the program directory.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> prefix is a string path addition, usually<br /> './' that allows the function to deal<br /> with relative paths for files that reside<br /> in subfolders.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> basedir is the name of the main MineSight<br /> project directory (Fwaulu, for example).</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Side effect function.<br /> """<br /> for pathx in filestocompress:<br /> basename = os.path.split(pathx)[1]<br /> # Need unique name for subfolder files with same names.<br /> uniqueid = fld.UNIQUEX.format(COUNTERX.next())<br /> uniquename = uniqueid + basename<br /> print(fld.COMPRESSING.format(prefix + basename))<br /> archx = z7thing(fld.ARCHIVEX.format(basedir, uniquename),<br /> [prefix + basename])<br /> archx.compress()</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def compressfilesmultiple(filestocompress, indexx, basedir):<br /> """<br /> Compresses files into an archive.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> filestocompress is a list of paths of<br /> files to be compressed. These files<br /> reside inside the program directory.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> indexx is an integer that gives the<br /> archive a unique name.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> basedir is the name of the main MineSight<br /> project directory (Fwaulu, for example).</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Side effect function.<br /> """<br /> print(fld.COMPRESSING.format(fld.MULTFILES))<br /> archx = z7thing(fld.MULTARCHIVEX.format(basedir, indexx),<br /> filestocompress)<br /> archx.compress()</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def segregatefiles(directoryx, basefiles):<br /> """<br /> From a string directory path directoryx<br /> and a list of base file names, returns<br /> a dictionary of lists of files and their<br /> sizes sorted on size and keyed on file<br /> category.<br /> """<br /> retval = {}<br /> # Add separator to end of directory path.<br /> directoryx += fld.FRONTSLASH<br /> # Get all files in folder and their sizes.<br /> allfiles = [(os.path.getsize(fld.FILEB.format(directoryx, filex)), filex)<br /> for filex in basefiles]<br /> retval[fld.SMALL] = [x for x in allfiles if x[0] < fld.HALFLIMIT]<br /> retval[fld.SMALL].sort()<br /> retval[fld.MEDIUM] = [x for x in allfiles if x[0] >= fld.HALFLIMIT and<br /> x[0] <= fld.CHUNK]<br /> retval[fld.MEDIUM].sort()<br /> retval[fld.LARGE] = [x for x in allfiles if x[0] > fld.CHUNK]<br /> retval[fld.LARGE].sort()<br /> return retval</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def deletefiles(movedfiles):<br /> """<br /> Delete files that have been compressed.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> movedfiles is a list of paths of<br /> files that have been moved or copied to<br /> the program directory for compression.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Side effect function.<br /> """<br /> for pathx in movedfiles:<br /> print(fld.DELETING.format(pathx))<br /> os.remove(pathx)</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def getsmallfilegroupings(smallfiles):<br /> """<br /> Generator function that yields<br /> a list of files whose sum is <br /> less than the program's limit<br /> for bytes to be archived in a <br /> multiple file archive.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> smallfiles is a list of two tuples<br /> of (filesize in bytes, file path).<br /> """<br /> lenx = len(smallfiles)<br /> insidecounter1 = 0<br /> insidecounter2 = 1<br /> sumx = 0<br /> while (insidecounter2 < (lenx + 1)):<br /> sumx = sum(x[0] for x in smallfiles[insidecounter1:insidecounter2])<br /> if sumx > fld.MULTFILELIMIT:<br /> # Back up one.<br /> insidecounter2 -= 1<br /> yield (x[1] for x in smallfiles[insidecounter1:insidecounter2])<br /> # Reset and advance counters.<br /> sumx = 0<br /> insidecounter1 = insidecounter2 + 1<br /> insidecounter2 = insidecounter1 + 1<br /> else:<br /> insidecounter2 += 1</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def compresslargefiles(largefiles, dirx, prefix, basedir, splitfilestrackerfile):<br /> """<br /> Deal with compression of files that need to<br /> be split prior to compression.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> largefiles is a list of two tuples of file<br /> sizes and names.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> dirx is the directory (str) in which the files<br /> are located.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> prefix is a string prefix to augment path<br /> identification for compression.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> basedir is the name of the main MineSight<br /> project directory (Fwaulu, for example).</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> splitfilestrackerfile is an open file<br /> object used for tracking file splits<br /> for later retrieval.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Side effect function.<br /> """<br /> for filex in largefiles:<br /> # Get generator of paths of splits.<br /> splitfiles = splitfile(fld.FILEB.format(dirx, filex[1]),<br /> splitfilestrackerfile)<br /> movedfiles = movefiles(splitfiles, prefix)<br /> compressfilessingle(movedfiles, prefix, basedir)<br /> deletefiles(movedfiles)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def compressmediumfiles(mediumfiles, dirx, prefix, basedir):<br /> """<br /> Deal with compression of files that need to<br /> be compressed each to its own archive.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mediumfiles is a list of two tuples of file<br /> sizes and paths.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> dirx is the directory (str) in which the files<br /> are located.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> prefix is a string prefix to augment path<br /> identification for compression.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> basedir is the name of the main MineSight<br /> project directory (Fwaulu, for example).</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Side effect function.<br /> """<br /> filestocopy = (dirx + x[1] for x in mediumfiles)<br /> copiedfiles = copyfiles(filestocopy, prefix)<br /> compressfilessingle(copiedfiles, prefix, basedir)<br /> deletefiles(copiedfiles)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def compresssmallfiles(smallfiles, dirx, prefix, indexx, basedir):<br /> """<br /> Deal with compression of files that can be<br /> compressed in groups.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mediumfiles is a list of two tuples of file<br /> sizes and paths.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> dirx is the directory (str) in which the files<br /> are located.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> prefix is a string prefix to augment path<br /> identification for compression.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> indexx is the current index that the 7zip<br /> file counter (ensures unique archive name)<br /> is on.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> basedir is the name of the main MineSight<br /> project directory (Fwaulu, for example).</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Returns integer for current archive counter<br /> index.<br /> """<br /> smallgroupings = getsmallfilegroupings(smallfiles)<br /> while True:<br /> try:<br /> grouplittlefiles = smallgroupings.next()<br /> littlefiles = (dirx + x for x in grouplittlefiles)<br /> copiedfiles = copyfiles(littlefiles, prefix)<br /> compressfilesmultiple(copiedfiles, indexx, basedir)<br /> indexx += 1<br /> deletefiles(copiedfiles)<br /> except StopIteration:<br /> break<br /> return index</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># XXX - hack<br />def matchbasedir(folderlist):<br /> """<br /> Get MineSight project folder name<br /> that matches a folder in the path<br /> in question.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> folderlist is a list (in order)<br /> of directories in a path.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Returns string.<br /> """<br /> for folderx in folderlist:<br /> for projx in fld.BASEDIRS:<br /> if projx == folderx:<br /> return folderx<br /> return None</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def getbasedir(pathx):<br /> """<br /> Returns two tuple of strings for<br /> basedir and basefolder (project<br /> directory name and base path under<br /> project directory copied to program<br /> directory).</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> pathx is the directory path being<br /> processed (str).<br /> """<br /> # basedir is project name (Fwaulu, for example).<br /> foldernames = pathx.split(fld.FRONTSLASH)<br /> basedir = matchbasedir(foldernames)<br /> # Get directory under project directory.<br /> # _msresources, for example.<br /> idx = foldernames.index(basedir)<br /> # Directory under program directory ./ for MineSight files.<br /> basefolder = fld.SAMEFOLDER + fld.FRONTSLASH.join(foldernames[idx + 1:])<br /> return basedir, basefolder</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def dealwithtoplevel(firstdir):<br /> """<br /> Compress top level files in the <br /> MineSight project directory.<br /> <br /> firstdir is the three tuple returned<br /> from the os.walk() generator function.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Returns two tuple of integer smallfile<br /> multifilecounter used for naming<br /> multiple file archives and splitfilestrackerfile,<br /> an open file object for tracking split<br /> files for later reconstruction.<br /> """<br /> # Top level files.<br /> dirx = firstdir[0] + fld.FRONTSLASH<br /> basedir, basefolder = getbasedir(dirx)<br /> # File to track split files for later glueing back together.<br /> splitfilestrackerfile = open(fld.SAMEFOLDER + basedir + fld.FRONTSLASH +<br /> fld.SPLITFILETRACKER, fld.W)<br /> firstdirfiles = segregatefiles(firstdir[0], firstdir[2])<br /> compresslargefiles(firstdirfiles[fld.LARGE], dirx, fld.EMPTY, basedir,<br /> splitfilestrackerfile)<br /> compressmediumfiles(firstdirfiles[fld.MEDIUM], dirx, fld.EMPTY, basedir)<br /> # This is for keeping track of<br /> # archives with more than one file.<br /> multifilecounter = 1<br /> mulitfilecounter = compresssmallfiles(firstdirfiles[fld.SMALL], dirx,<br /> fld.EMPTY, multifilecounter, basedir)<br /> return multifilecounter, splitfilestrackerfile</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def dealwithlowerleveldirectories(dirs, multifilecounter, splitfilestrackerfile):<br /> """<br /> Finishes out compression of lower level<br /> folders under top level MineSight project<br /> directory.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> dirs is a partially exhausted (one iteration)<br /> os.walk() generator.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> multifilecounter is an integer used for<br /> naming multiple file archives.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> splitfilestrackerfile is an open file<br /> object used for tracking file splits<br /> for later retrieval.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Returns orphanedfolders, a list of lower level<br /> folders to be deleted at the end of the program<br /> run.<br /> """<br /> orphanedfolders = []<br /> for dirx in dirs:<br /> # XXX - hack - I hate dealing with Windows paths.<br /> dirn = dirx[0].replace(fld.BACKSLASH, fld.FRONTSLASH)<br /> diry = dirn + fld.FRONTSLASH<br /> basedir, basefolder = getbasedir(diry)<br /> # Create directory in program path.<br /> fauxdir = fld.PROGFOLDER[:-1] + basefolder[1:-1]<br /> os.mkdir(fauxdir)<br /> orphanedfolders.append(fauxdir)<br /> # Skip anything that doesn't have files.<br /> if not dirx[2]:<br /> continue<br /> # Easiest way to do this might be<br /> # to track directories and sort<br /> # files according to size, then<br /> # filter them accordingly.<br /> dirfiles = segregatefiles(dirx[0], dirx[2])<br /> compresslargefiles(dirfiles[fld.LARGE], diry, basefolder,<br /> basedir, splitfilestrackerfile)<br /> compressmediumfiles(dirfiles[fld.MEDIUM], diry, basefolder, basedir)<br /> multifilecounter = compresssmallfiles(dirfiles[fld.SMALL], diry, basefolder,<br /> multifilecounter, basedir)<br /> splitfilestrackerfile.close()<br /> return orphanedfolders</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def walkdir(dirx):<br /> """<br /> Traverse MineSight project directory,<br /> 7zipping everything along the way.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> dirx is a string for the directory<br /> to traverse.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Side effect function.<br /> """<br /> dirs = os.walk(dirx)<br /> # OK - os.walk returns generator that<br /> # yields a tuple in the format<br /> # (str path,<br /> # [list of paths for directories under path],<br /> # [list of filenames under path])</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> # Top level (Fwaulu, for instance).<br /> # These files will not have a path<br /> # prefix of any sort in their respective<br /> # archives.<br /> firstdir = dirs.next()<br /> multifilecounter, splitfilestrackerfile = dealwithtoplevel(firstdir)<br /> # All other files and folders.<br /> orphanedfolders = dealwithlowerleveldirectories(dirs, multifilecounter,<br /> splitfilestrackerfile)<br /> # Delete lower level folders first - this is necessary.<br /> orphanedfolders.reverse()<br /> for orphanx in orphanedfolders:<br /> print(fld.DELETINGDIR.format(orphanx))<br /> os.rmdir(orphanx)</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def cyclefolders(folderx):<br /> """<br /> Wrapper function for compression<br /> of folder folderx (string).</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Side effect function.<br /> """<br /> # 1) Set up empty project directory (ex: Fwaulu)<br /> # in program directory.<br /> # 2) For first set of files, use no prefix for<br /> # 7zip archive storage (filename only).<br /> # 3) Check for size of file.<br /> # 4) If file is bigger than fld.CHUNK, split.<br /> # 5) If file is smaller than fld.CHUNK, but bigger than<br /> # MULTFILELIMIT, compress to one archive.<br /> # 6) If file is smaller than fld.CHUNK, and smaller than<br /> # MULTFILELIMIT, check subsequent files to determine<br /> # files to include in archive. Keep track of file<br /> # index that puts number of bytes over limit.<br /> # 7) Compress multiple files to one archive - index<br /> # archive to ensure unique name.<br /> # 8) For all following sets of files, same process,<br /> # but must prefix paths with SAMEFOLDER and any<br /> # additional folder names.<br /> foldertracker = []<br /> # Make directory folder in program directory<br /> # to hold 7zip files.<br /> zipfolder = getbasedir(folderx)[0]<br /> os.mkdir(zipfolder)<br /> foldertracker.append(zipfolder)<br /> walkdir(folderx)<br /> print('\nDone')</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: inherit;">cyclefolders is the overarching wrapper function for the module (compression operation).</span><br />
<br />
<span style="font-family: "Courier New", Courier, monospace;">#!java -jar C:\jython2.7.0\jython.jar</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;"># unsevenzipper.py</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">"""<br />Use java 3rd party 7-zip compression<br />library (sevenZipJBindings) from<br />jython to un-7zip archives.<br />"""</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;"># Need to adjust path to get necessary jar imports.<br /># XXX - it might be cleaner to chain imports by using<br /># the sevenzipper (s7 alias) below to reference<br /># double imported modules. For development and<br /># convenience I reimported everything as though<br /># sevenzipper.py and unsevenzipper.py were separate<br /># operations.<br />import sys<br />import folderstozip as fld<br />sys.path.append(fld.PATH7ZJB)<br />sys.path.append(fld.PATH7ZJBOSSPEC)</span><br />
<span style="font-family: "Courier New", Courier, monospace;">import os</span><br />
<span style="font-family: "Courier New", Courier, monospace;">import sevenzipper as s7</span><br />
<span style="font-family: "Courier New", Courier, monospace;">import SevenZipThingExtract</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">def subdirectoryornot(pathx):<br /> """<br /> Boolean function that returns<br /> True if string pathx is a<br /> subdirectory of the MineSight<br /> project folder and False if<br /> the files belong directly to<br /> the MineSight project folder.<br /> """<br /> pathx = pathx.replace(fld.SAMEFOLDERWIN, fld.BACKSLASH)<br /> pathlist = pathx.split(fld.BACKSLASH)<br /> if len(pathlist) > 1:<br /> return True<br /> return False</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">def getdirectories(dirx):<br /> """<br /> Get list of lists of directories<br /> in path under project folder<br /> from 7zip archives in project<br /> folder for archives.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> Returns two tuple of list and<br /> dictionary indicating which<br /> 7z files are same directory<br /> archives and which are archived<br /> subdirectory files.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> dirx is a string for the file<br /> path of the directory to<br /> be walked (./Fwaulu for example).<br /> """<br /> dirs = os.walk(dirx)<br /> # One level, no subfolders.<br /> files = dirs.next()[2]<br /> # Get directories first.<br /> rawpaths = []<br /> subdirornot = {}<br /> for filex in files:<br /> # Skip uncompressed split file tracker.<br /> if filex == fld.SPLITFILETRACKER:<br /> continue<br /> # I don't know if it's a subdirectory or not, so I'll go with False.<br /> s7tx = SevenZipThingExtract(dirx + fld.FRONTSLASH + filex, dirx, False)<br /> folders = list(s7tx.getfolders())<br /> rawpaths.extend(folders)<br /> # All the paths in folders have the same prefix - <br /> # just do one.<br /> subdirornot[filex] = subdirectoryornot(folders[0])<br /> # Get just directories<br /> justdirectories = [pathx.replace(fld.SAMEFOLDERWIN, fld.BACKSLASH).split(fld.BACKSLASH)[1:-1]<br /> for pathx in rawpaths if pathx.split(fld.BACKSLASH)[1:-1]]<br /> justdirectories = set([tuple(x) for x in justdirectories])<br /> justdirectories = list(justdirectories)<br /> justdirectories.sort()<br /> return justdirectories, subdirornot</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">def makedirectories(dirn):<br /> """<br /> Create directory paths within archive<br /> project folder to accept uncompressed<br /> files.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> Returns subdirornot dictionary.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> dirn is a string for the file<br /> path of the directory to<br /> be walked (./Fwaulu for example).<br /> """<br /> justdirectories, subdirornot = getdirectories(dirn)<br /> maxdepth = max(len(dirx) for dirx in justdirectories)<br /> for x in xrange(0, maxdepth):<br /> justdirectoriesii = set([tuple(dirx[0:x + 1]) for dirx in justdirectories<br /> if len(dirx) >= x + 1])<br /> for diry in justdirectoriesii:<br /> dirw = dirn + fld.FRONTSLASH + fld.FRONTSLASH.join(diry)<br /> os.mkdir(dirw)<br /> return subdirornot</span><br />
<span style="font-family: "Courier New", Courier, monospace;">def extractfiles(dirx):<br /> """<br /> Extract files from 7z files<br /> in project archive folder.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> Side effect function.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> dirx is a string for the file<br /> path of the directory to<br /> be walked.<br /> """<br /> subdirornot = makedirectories(dirx)<br /> dirs = os.walk(dirx)<br /> # One level, no subfolders.<br /> files = dirs.next()[2]<br /> for filex in files:<br /> # Skip uncompressed split file tracker.<br /> if filex == fld.SPLITFILETRACKER:<br /> continue<br /> s7tx = SevenZipThingExtract(dirx + fld.FRONTSLASH + filex,<br /> dirx, subdirornot[filex])<br /> s7tx.extractfiles()</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">def gluetogethersplitfiles(dirx):<br /> """<br /> Make split up files whole.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> Side effect function.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> dirx is the folder in which the split<br /> files reside.<br /> """<br /> # Glue together big files.<br /> # Do this in a very controlling,<br /> # structured way:<br /> # 1) Read the split file tracker csv file.<br /> # 2) Determine the number and names and paths<br /> # of files to be reconstructed and the<br /> # number of parts in each.<br /> # 3) Check that everything is there for<br /> # each file to be reconstructed.<br /> # 4) Get the new relative path.<br /> # 5) Glue back together programmatically.<br /> splitfiles = []<br /> # fld.SPLITFILETRACKER is structured as original path<br /> # of file split, number of file split.<br /> with open(fld.SAMEFOLDERWIN + dirx +<br /> fld.FRONTSLASH + fld.SPLITFILETRACKER, 'r') as f:<br /> for linex in f:<br /> strippedline = [x.strip() for x in linex.split(fld.UCOMMA)]<br /> splitfiles.append(tuple(strippedline))<br /> orignames = [x[0] for x in splitfiles]<br /> splitoriginals = set(orignames)<br /> # Make dictionary that is easy to cycle through.<br /> filesx = {}<br /> for orig in splitoriginals:<br /> basedir, basefolder = s7.getbasedir(orig)<br /> filesx[orig] = {}<br /> filesx[orig][fld.BASEPATH] = fld.SAMEFOLDER + basedir + basefolder[1:]<br /> filesx[orig][fld.FILES] = (fld.SPLITFILE.format(filesx[orig][fld.BASEPATH], filex[1])<br /> for filex in splitfiles if filex[0] == orig)<br /> for orig in filesx:<br /> with open(filesx[orig][fld.BASEPATH], fld.WB) as mainfile:<br /> for filex in filesx[orig][fld.FILES]:<br /> with open(filex, fld.RB) as splitfile:<br /> mainfile.write(splitfile.read())</span><br />
<br />
<span style="font-family: "Courier New", Courier, monospace;">def restore(dirx):<br /> """<br /> Restores MineSight project directory<br /> inside program path.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> dirx is a string for the directory<br /> to be restored (./Fwaulu, for example).</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> Side effect function.<br /> """<br /> extractfiles(dirx)<br /> gluetogethersplitfiles(dirx)<br /> print('Done')</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: inherit;">restore is the main function for the module (uncompression).</span><br />
<br />
Notes:<br />
<br />
1) I don't have admin rights at work and did not have javac (the compiler for java) available. You can download an SDK or SRE java package from Oracle that has it. Without admin rights, you can't install it normally. Still you can use it. My compilation went something like this:<br /><br /><span style="font-family: "Courier New", Courier, monospace;"><path to downloaded JDK>/bin/javac -cp <path to downloaded 7-ZipJBinding>/lib/* <myclassname>.java</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: inherit;">2) I've left all the split up files and 7z archives in the folder where I decompress my files and recombine the split files. This takes up a lot of space depending on what you're working with. If space is at a premium, you probably want to write jython code to move or delete the archives after uncompressing them.</span><br />
<br />
3) The most time consuming part of runtime is the compression, uncompression, and splitting and recombining of split files. Porting some of this to java (instead of jython) might speed things up. I code faster and generally better in jython. Also, my objective was control, not speed. YMMV (your mileage may vary) with this approach. There are far better general purpose ones.<br /><br />Thanks for stopping by.Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com1tag:blogger.com,1999:blog-524230429673765509.post-13028420910894403262015-12-11T20:57:00.000-08:002015-12-12T18:40:37.439-08:00Improved Storing and Displaying Images in Postgresql - bytea<a href="http://pyright.blogspot.com/2015/10/storing-and-displaying-images-in.html" target="_blank">Last post</a> I brute forced the storage of binary image (jpeg) data as text in a Postgresql database, and accordingly brute forced the data's display in the Unix image viewer feh from output from a psql query. It was hackish and I received some negative, but good constructive criticism on how to improve it:<br />
<br />
1) use Python's base64 module instead of the binascii one.<br />
<br />
2) use bytea as a storage type in Postgresql instead of text.<br />
<br />
<a href="https://www.blogger.com/profile/15155998626202067226" target="_blank">Marius Gedminus</a> made the base64.b64encode suggestion for text. It does make for a little less storage space. Ultimately we won't go with this solution because we want to go with bytea, the Postgresql data type intended for this type of data. But for completeness, here is what a base64.b64encode text solution would look like:<br />
<br />
<b><span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;"><span style="font-size: small;">$ python3.5 <br />Python 3.5.0 (default, Oct 23 2015, 21:23:18) <br />[GCC 4.2.1 20070719 ] on openbsd5<br />Type "help", "copyright", "credits" or "license" for more information.)<br />>>> import base64<br />>>> f = open('prrrailwhaletankcar.jpg', 'rb')<br />>>> bindata = f.read()<br />>>> f.close()<br />>>> b64data = str(base64.b64encode(bindata))<br />>>> # Converted data to string for write with csv file<br />>>> # to database table text field.<br />>>> # The string representation of BASE 64 includes the<br />>>> # letter b and single quotes.<br />>>> b64data[:10]<br />"b'/9j/4AAQ"<br />>>> b64data[-10:]<br />"AVAH/2Q=='"<br />>>> b64data[1]<br />"'"<br />>>> b64data[-1]<br />"'"<br />>>> # Isolate the BASE 64 digits with the quotes included.<br />>>> substrx = b64data[1:]<br />>>> picdata = base64.b64decode(substrx)<br />>>> f = open('test.jpg', 'wb')<br />>>> f.write(picdata)<br />187810<br />>>> f.close()<br />>>> len(substrx)<br />250418<br />>>> # BASE 64 string is 1 1/3 times as big as the <br />>>> # binary data it represents.<br />>>> _/187810<br />1.3333581811405144<br />>>> # Taking off the quote marks doesn't inhibit the<br />>>> # decoding of the BASE64 string at all - probably<br />>>> # best to go with this less is more approach.<br />subsubstrx = substrx[1:-1]<br />>>> picdata = base64.b64decode(subsubstrx)<br />>>> f = open('test2.jpg', 'wb')<br />>>> f.write(picdata)<br />187810<br />>>> f.close()<br />>>> len(picdata)<br />187810<br />>>> # BASE64 string ever so slightly smaller without<br />>>> # the quote marks (2 chars).<br />>>> len(subsubstrx)<br />250416<br />>>> _/187810<br />1.3333475320802939<br /><br />>>> # Works in both cases.<br />>>> os.system('feh --geometry 400x300+200+200 test.jpg')<br /><br />0<br />>>> os.system('feh --geometry 400x300+200+200 test2.jpg')<br /><br />0<br />>>></span></span></span></b><br />
<br />
<span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;"><span style="font-size: small;"> </span></span></span>The results for both commands in the last lines (show picture with feh) look the same:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDCAZZG0WQex37IQ8UW_dCs7EPj1lRVtPvCoITetBMs0BUc4rSxc3dvEyV26OKYufNgafS9xBkPcPqBEBeVyYDF8u5z5pvBm1dxy29mJ5lAI8dufW7MZEjkmcEWhjCyfHBwOBGPcGF8n8/s1600/showpic.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="348" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDCAZZG0WQex37IQ8UW_dCs7EPj1lRVtPvCoITetBMs0BUc4rSxc3dvEyV26OKYufNgafS9xBkPcPqBEBeVyYDF8u5z5pvBm1dxy29mJ5lAI8dufW7MZEjkmcEWhjCyfHBwOBGPcGF8n8/s640/showpic.png" width="640" /> </a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
Storing the BASE 64 string in a Postgresql text column is the same as storing the hex one like I did in the last post. The main thing to look out for is the proper stripping of the Python generated string for extra characters - single quotes are OK as long as they are matched on either end of the string. As I mentioned in the code comments above, knowing what I know now, I would strip them out too even prior to storing the string in a database.</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
On to the Postgresql bytea storage part of the post. Someone I respect asked me on Facebook, "Why didn't you just use bytea (for storage)?" I had to sheepishly own up to just not being used to working with binary data (as opposed to strings) so I went with what I knew. Shame drove me to at least attempt to do things the right way - binary storage for binary data, in this case a jpeg image.</div>
<br />
Postgresql 9.4 uses a hex based representation (<a href="http://www.postgresql.org/docs/9.4/static/datatype-binary.html" target="_blank">hex format</a>) for the bytea data type by default. It is possible to mess this up - it is covered in the doc but I didn't read it carefully enough:<br />
<br />
If you preface your hexadecimal string with \x (single backslash) you will end up with an octal representation of your binary data (digits 0 through 7). \\x prior to the hexadecimal string will give you what you, or at least I want, hexidecimal representation of your binary data on output. The SQL string I used for processing my string data (already in the database from my work on the last blog post):<br />
<br />
<br />
<br />
<br />
<b><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;">/* Postgresql SQL code */</span></span></b><br />
<b><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;">CAST('\\x' || <hexadecimal string> AS bytea)</span></span></b><br />
<br />
The || operator is for concatenation of strings (this is probably obvious to Postgresql and other database distro users but MSSQL uses a + symbol so it was a little new to me).<br />
<br />
To deal with transitioning all my text picture columns to bytea I did the following:<br />
<br />
1) create a new set of identical tables to the ones I had in the same database with new relations identical to the old ones but with the new set of tables.<br />
<br />
2) fill the new tables in with the new data that has all the former text columns for binary as bytea.<br />
<br />
3) delete the old tables once the new ones are filled in.<br />
<br />
4) rename the new tables to match the names of the old ones (how I wanted the database schema to look in the first place).<br />
<br />
<br />
<br />
Postgresql is different than MSSQL in that the database is more its own autonomous entity that needs to be connected to other databases by some introduced mechanism. In MSSQL, databases on the same server can reference each other in queries by default. I started looking into the Postgresql fdw (foreign data wrapper) plugin, then realized I could do this more easily with the path I took above.<br />
<br />
It's not necessary to post all the SQL code. I used a psql variable in my SQL for the hexadecimal data predicate to make sure I got it right each time. From inside psql I executed the SQL files with the \i metacommand. Here is a snippet with the variable.<br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="color: purple;"><br />/* Postgresql SQL code to be used with</span></b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="color: purple;"> the Postgresql psql interpreter */</span></b></span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="color: purple;">/* Need this for bytea conversion<br /> from hex string */<br />\set byteaidstr '\\x'<br /> </span></b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="color: purple;">INSERT INTO locomotiveprototypes2<br /> SELECT keyx,<br /> namex,<br /> railnamex,<br /> paintscheme,<br /> photourl,<br /> comments,<br /> CAST(:'byteaidstr' || picture AS bytea)<br /> FROM locomotiveprototypes;</span></b></span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="color: purple;"></span></b></span>The variable thing in psql takes a little getting used to but the <a href="http://www.postgresql.org/docs/9.4/static/app-psql.html#APP-PSQL-INTERPOLATION" target="_blank">Postgresql documentation</a> is good about explaining when and how to use the single quote marks and where to put them. It worked out.<br />
<br />
The most important part: getting the picture to show up from a psql metacommand through the use of a python script. Here is my modified script similar to the one in my last post:<br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="color: purple;"></span></b></span><br />
<span style="color: blue;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><b><span style="color: purple;"><span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;"><b>#!/usr/local/bin/python3.5<br /><br />"""<br />Processing of image coming out<br />of Postgresql query as a stream.<br /><br />Deals with bytea column string<br />output from psql.<br />"""<br /><br />import base64<br />import sys<br />import subprocess<br /><br />DECODED = 'decoded'<br /><br />SIZEMSG = '\nsize of {0:s} output = {1:d}\n'<br />SIZERATIOMSG = '\nsize of {0:s} output/size of binary output = {1:05.5f}\n'<br /><br /># Want to avoid '\\x' in query output.<br />STARTINDEX = 3<br /><br />FEHCMD = ['feh', '--geometry', '400x300+200+200', '-']<br /><br />BYTEAFMT = 'bytea hex format'<br /><br /># 2 variables track changes in size of <br /># hex output from query in psql.<br />sizex = 0<br />lenxbin = 0<br /><br /># Feeding to script straight from<br /># psql \copy metacommand.<br />inputx = sys.stdin.buffer.read()<br />sizex = len(inputx)<br /><br /># print's are mainly for flagging when something goes wrong. <br /># aka debugging<br />print(inputx[:10])<br />print(inputx[STARTINDEX])<br />print(inputx[-10:])<br /><br /># -1 index in slice chops off the return character '\n'<br /># Need casefold=True to deal with lower case from Postgresql.<br />binx = base64.b16decode(inputx[STARTINDEX:-1], casefold=True)<br />lenxbin = len(binx)<br /><br /># print's highlight size relationship between<br /># hex representation and actual binary data.<br />print(SIZEMSG.format(BYTEAFMT , sizex))<br />print(SIZEMSG.format(DECODED, lenxbin))<br />print(SIZERATIOMSG.format(BYTEAFMT, sizex/lenxbin))<br /><br /># Pops up picture on screen.<br />subprocess.run(FEHCMD, input=binx)<br /><br />print('\nDone\n')</b></span></span></span></b></span><br />
<br />
An important change I made from last time is fixing the call to the image viewer feh to eliminate all that hacky intermediate writing of a jpeg file that took forever (in computer time). It turns out feh accepts binary input from a pipe or stdin just fine - I just needed to read the man page more thoroughly.<br />
<br />
Now to see if this works:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="color: purple;">$ psql hotrains carl<br />Password for user carl: <br />psql (9.4.4)<br />Type "help" for help.<br /><br />hotrains=# \copy (SELECT picture FROM locomotiveprototypes WHERE keyx = 3) to program 'imageshow.py'<br />COPY 1<br />b'\\\\xffd8ffe'<br />102<br />b'8a000ffd9\n'<br /><br />size of bytea hex format output = 1081720<br /><br /><br />size of decoded output = 540858<br /><br /><br />size of bytea hex format output/size of binary output = 2.00001</span></b></span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgscoP2itXw1EkQNlzT-eohkjff7d_hJnO3D-pBotsPuTjPx8W9M5h6xxBWJFvNIwDfyHh-GqB8mN6qAMEAQfSPAbcGU5pdcnlPhy1jWcDoDn6WyCBBLpZcAzVbOWXXBrLjnCTH-hjBxcE/s1600/showpic3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="216" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgscoP2itXw1EkQNlzT-eohkjff7d_hJnO3D-pBotsPuTjPx8W9M5h6xxBWJFvNIwDfyHh-GqB8mN6qAMEAQfSPAbcGU5pdcnlPhy1jWcDoDn6WyCBBLpZcAzVbOWXXBrLjnCTH-hjBxcE/s640/showpic3.png" width="640" /></a></div>
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="color: purple;"></span></b></span><span style="font-family: "courier new" , "courier" , monospace;"><b><span style="color: purple;"></span></b></span>And we're good to go.<br />
<br />
Thanks for stopping by.Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com1tag:blogger.com,1999:blog-524230429673765509.post-35531968279355670552015-10-17T23:36:00.002-07:002017-12-30T07:50:29.340-08:00Storing and Displaying Images in Postgresql<a href="http://pyright.blogspot.com/2015/10/setting-up-toy-postgresql-database-on.html" target="_blank">Last post</a> I set up a toy (literally) Postgresql database for my model train car collection. A big part of the utility of the database is its ability to store images (pictures or photos) of the real life prototype and model train cars. Postgresql (based on my google research) offers a couple methods of doing that. I'll present how I accomplished this here. The method I chose suited my home needs. For a commercial or large scale project, something more efficient in the way of storage and speed of retrieval may be better. Anyway, here goes.<br />
<br />
I chose to store my photos as text representations of binary data in Postgresql database table columns with the text data type. This decision was mainly based on my level of expertise and the fact that I am doing this for home use as part of a learning experience. Storing the binary data as text inflates their size by a factor of two - very inefficient for storage. For home use in a small database like mine, storage is hardly an issue. At work I transfer a lot of binary data (3 dimensional mesh mined solids) to a distant server in text format using MSSQL's bcp. Postgresql is a little different, but I am familiar with the general idea of stuffing a lot of text in a database column.<br />
<br />
In order to get the data into comma delimited rows without dealing with a long, unwieldy string of text from the photos, I wrote a Python script to do it:<br />
<br />
<span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;">#!python3.4<br /><br />"""<br />Prepare multiple rows of data<br />that includes a hexlify'd<br />picture for a column in<br />a table in the model train<br />database.<br />"""<br /><br />import binascii<br />import os<br /><br />UTF8 = 'utf-8'<br /># LATIN1 = 'latin-1'<br /><br />INFOFILE = 'infoiii.csv'<br /><br />PICTUREFILEFMT = '{:s}.{:s}'<br />ROWFILEOUTFMT = '{:s}row'<br /><br />JPG = 'jpg'<br />PNG = 'png'<br /><br />COMMA = ','<br /><br />PATHX = '/home/carl/postgresinstall/workpictures/multiplecars/'<br /><br />PATHXOUT = PATHX + 'rows/'<br /><br />PHOTOMSG = 'Now doing photo {:s} . . .'<br /><br />def checkfileextension(basename):<br /> """<br /> With the basename of an image file<br /> returns True for jpg and false for<br /> anything else (png).<br /> """<br /> if os.path.exists(PATHX +</span></span><br />
<span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;"> PICTUREFILEFMT.format(basename, JPG)):<br /> return True<br /> else:<br /> return False<br /><br />with open(PATHX + INFOFILE, 'r', encoding=UTF8) as finfo:<br /> for linex in finfo:<br /> newlineparts = [x.strip() for x in linex.split(COMMA)]<br /> photox = newlineparts.pop()<br /> print(PHOTOMSG.format(photox))<br /> # Check for jpg or png here<br /> # XXX - this could be better - could actually<br /> # check and return actual extension;<br /> # more code and lazy.<br /> extension = ''<br /> if checkfileextension(photox):<br /> extension = JPG<br /> else:<br /> extension = PNG<br /> with open(PATHX +</span></span><br />
<span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;"> PICTUREFILEFMT.format(photox,</span></span><br />
<span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;"> extension), 'rb') as fphoto:<br /> contents = binascii.hexlify(fphoto.read())<br /> liney = COMMA.join(newlineparts)<br /> liney += COMMA<br /> liney = bytes(liney, UTF8)<br /> liney += contents<br /> with open(PATHXOUT +</span></span><br />
<span style="color: blue;"><span style="font-family: "courier new" , "courier" , monospace;"> ROWFILEOUTFMT.format(photox), 'wb') as frow:<br /> frow.write(liney)<br /><br />print('\nDone\n')</span></span><br />
<br />
The basic gist of the script is to get each photo name provided into a file that can be later imported into a table in Postgresql. The paths in the capitalized "constants" would have to be adjusted for your situation (I tend to go overboard on capitalized constants because I'm a lousy typist and want to avoid screwing up and then having to debug my typos). The INFOFILE referred to in the script has roughly the following format:<br />
<br />
<column1data>, <column2data>, . . . , <photofilename><br />
<br />
So the idea is to take a comma delimited file, encode it in UTF-8, and stuff the binary data from the (correct) photo at the end as text. I designed my database tables with photos (I use the column name "picture") with the text data column as the last - this is kind of a hack, but it made scripting this easier.<br />
<br />
An example of importing one of these "row" files into the database table from within psql:<br />
<br />
<span style="color: #660000;"><span style="font-family: "courier new" , "courier" , monospace;">$ psql hotrains carl<br /><span style="color: #4c1130;">Password for user carl:<br />psql (9.4.1)<br />Type "help" for help.<br /><br />hotrains=# \d<br /> List of relations<br /> Schema | Name | Type | Owner <br />--------+------------------------+-------+-------<br /> public | rollingstockprototypes | table | carl<br />(1 row)<br /><br />hotrains=# \d rollingstockprototypes<br /> Table "public.rollingstockprototypes"<br /> Column | Type | Modifiers <br />----------+------------------------+-----------<br /> namex | character varying(50) | not null<br /> photourl | character varying(150) | not null<br /> comments | text | not null<br /> picture | text | not null<br />Indexes:<br /> "rsprotoname" PRIMARY KEY, btree (namex)<br /><br />hotrains=# COPY rollingstockprototypes FROM '/home/carl/postgresinstall/G39Arow' (DELIMITER ',');<br /><br />COPY 1</span></span></span><br />
<br />
My Python script for actually displaying a photo or image is a little hacky in that in requires checks for the size of the output versus the size of the information pulled from the Postgresql database table. My original script would show the picture piped to the lightweight UNIX image viewer <a href="http://feh.finalrewind.org/" target="_blank">feh</a> as partially complete. In order to get around this I put a timed loop in the script to check that the image data were about half of the size of the text data pulled. It works well enough, if slowly at times:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: blue;">#!/usr/local/bin/python3.4<br /><br />"""<br />Try to mimic processing of image<br />coming out of postgresql query<br />as a stream.<br />"""<br /><br />import binascii<br />import os<br />import time<br />import sys<br /><br />import argparse<br /><br /># Name of file containing psql \copy hex output (text).<br />HEXFILE = '/home/carl/postgresinstall/workpictures/hexoutput'<br /><br /># 2.5 seconds max delay before abort.<br /># Enough time to write most big pixel<br /># jpg's, it appears.<br />MAXTIME = 2.5<br />PAUSEX = 0.25<br /><br /># Argument name.<br />PICTURENAME = 'picturename'<br /><br />parser = argparse.ArgumentParser()<br />parser.add_argument(PICTURENAME)<br />args = parser.parse_args()<br />print(args.picturename)<br /><br /># Name of picture file<br /># written from hex query.<br />PICNAME = args.picturename<br /><br /># Extensions feh recognizes.<br />PNG = 'png'<br />JPG = 'jpg'<br /><br />FILEEXTENSIONMSG = '\nFile extension {:s} detected.\n'<br />UNRECOGNFILENAME = '\nUnrecognized file extension for picture '</span></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: blue;">UNRECOGNFILENAME += '{:s}\n'<br />ABORTMSG = '\nSorry, no data available for feh. Aborting.\n'<br /><br />SLEEPMSG = '\nSleeping {:2.2f} seconds . . .\n'<br /><br />SIZEHEXFILEMSG = '\nsize of hex output = {:d}\n'<br />SIZEBINARYMSG = '\nsize of binary file = {:d}\n'<br />SIZERATIOMSG = '\nsize of hex output/size of binary file '</span></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: blue;">SIZERATIOMSG += '{:05.5f}\n'<br /><br />ACCEPTABLEHEXTOBINRATIO = 1.99<br />ABORTMSGTOOSMALL = '\nSorry, not enough data to show a '</span></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: blue;">ABORTINGTOOSMALL += 'complete picture. Aborting.\n'<br /><br />extension = PICNAME[-3:]<br />if extension == PNG:<br /> print(FILEEXTENSIONMSG.format(PNG))<br />elif extension == JPG:<br /> print(FILEEXTENSIONMSG.format(JPG))<br />else:<br /> print(UNRECOGNFILENAME.format(extension))<br /> print(ABORTMSG)<br /> sys.exit()<br /><br />PICFILEFMT = '/home/carl/postgresinstall/workpictures/{:s}'<br />FEHFMT = 'feh -g 400x300+200+200 {:s}'<br /><br /># Length of binary string.<br />lenx = 0<br /># 2 variables track changes in size of <br /># hex output from query in psql.<br />sizex = 0<br />sizexnew = 0<br /># Tracks time spent sleeping.<br />totaltimewait = 0.0<br /><br />while totaltimewait < MAXTIME:<br /> # Try to make sure hex file is completely written.<br /> sizexnew = os.path.getsize(HEXFILE)<br /> if sizexnew > sizex or sizexnew == 0:<br /> sizex = sizexnew<br /> print(SLEEPMSG.format(PAUSEX))<br /> time.sleep(PAUSEX)<br /> totaltimewait += PAUSEX<br /> elif sizexnew == sizex:<br /> with open(HEXFILE, 'rb') as f2:<br /> with open(PICFILEFMT.format(PICNAME), 'wb') as f:<br /> strx = binascii.unhexlify(f2.read().strip())<br /> lenx = len(strx)<br /> print(SIZEHEXFILEMSG.format(sizexnew))<br /> print(SIZEBINARYMSG.format(lenx))<br /> print(SIZERATIOMSG.format(sizexnew/lenx))<br /> f.write(strx)<br /> break<br /><br /># I don't want part of a picture.<br />if not (sizexnew > 0 and<br /> sizexnew/lenx > ACCEPTABLEHEXTOBINRATIO):<br /> print(ABORTMSGTOOSMALL)<br /> sys.exit()<br /><br /># Pops up picture on screen.<br />os.system(FEHFMT.format(PICFILEFMT.format(PICNAME)))<br /><br />print('\nDone\n')</span></span><br />
<br />
Let's see if we can get a look at this in action - example of call from within psql:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: #4c1130;">hotrains=# \copy (SELECT decode(picture, 'hex') FROM rollingstockprototypes WHERE namex = 'G-39A Ore Jenny') to program 'cat > </span></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: #4c1130;"><span style="font-family: "courier new" , "courier" , monospace;">/home/carl/postgresinstall/workpictures/hexoutput</span> | imageshowiii.py'<br />COPY 1<br />hotrains=#</span></span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="color: #4c1130;"></span></span>And a screenshot of a (hopefully acceptable) result:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXuFBz1A_wQzlFc_HxruvCVDjNhXBULWvoQzrmNkfad1Fb-O60uNEsi3uTm7rhGNHN49Ov3tToivaRZxde7g53sQSGgbmT5wpuwW3_2PXrSNnCDLXsHnFDPBTenwGwWbYkrf2yyi4KFhQ/s1600/successfulimagedisplayPRR.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="289" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXuFBz1A_wQzlFc_HxruvCVDjNhXBULWvoQzrmNkfad1Fb-O60uNEsi3uTm7rhGNHN49Ov3tToivaRZxde7g53sQSGgbmT5wpuwW3_2PXrSNnCDLXsHnFDPBTenwGwWbYkrf2yyi4KFhQ/s640/successfulimagedisplayPRR.png" width="640" /></a></div>
<br />
Depending on which directory I've logged into psql under, I may have to type the full paths of the output and Python file.<br />
<br />
There is more I could do with this, but for now I'm OK with it. Writing to a file and then checking on its size is slow. There is probably a way to write to memory and check what's there, but I got stuck on that and decided to go with the less efficient solution.<br />
<br />
Thanks for stopping by.Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com1tag:blogger.com,1999:blog-524230429673765509.post-83402835911856431832015-10-17T21:53:00.000-07:002015-10-17T21:53:32.652-07:00Setting Up Toy Postgresql Database on OpenBSD<i>This isn't a Python scripting post, but the next one will be on the same topic. In this post I get a Postgresql database set up on my OpenBSD laptop and get familiar with the Postgresql environment.</i><br />
<br />
I primarily use Microsoft SQL Server and vendor supplied database schemas at work. I know Postgresql has a good reputation among open source databases, but I haven't had an opportunity to use it in a work environment (I had a brief brush with <a href="https://www.codelco.com/flipbook/codelcodigital5/pdfs/10_GA_1_JonathanOlson_JIGSAW.pdf" target="_blank">Jigsaw</a> years back - a competitor to <a href="http://www.modularmining.com/" target="_blank">Modular's MSSQL-based Powerview (Dispatch) in pit mining truck tracking database</a> - but that doesn't count.)<br />
<br />
Anyway, as I've noted in <a href="http://pyright.blogspot.com/2015/05/lenovo-thinkpad-x201-fan-replacement.html" target="_blank">previous posts</a>, I run OpenBSD as my operating system on my laptop at home. The OpenBSD project has a package for Postgresql.<br />
<br />
The first order of business is to install the Postgresql server package. First, I'll set up a PKG_PATH FTP mirror location from within the ksh shell:<br />
<br />
<span style="color: #660000;"><span style="font-family: "Courier New", Courier, monospace;">$ export PKG_PATH=ftp://ftp3.usa.openbsd.org/pub/OpenBSD/5.7/packages/i386/</span></span><br />
<br />
That ftp3.usa.openbsd.org server is the one in Boulder, Colorado - that's the one I usually use. I'm in Tucson, Arizona in the Mountain timezone, so it kind of makes sense to use that one. My understanding is that, in general, you want to use a mirror away from the main one to spread out the bandwidth and server use for the OpenBSD (or any other open source) project.<br />
<br />
Now to install the package - this has to be done as root. I use sudo for this (sudo's replacement, as I understand it, in OpenBSD 5.8 will be<a href="http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man1/doas.1" target="_blank"> doas(1)</a> although you'll still be able to get sudo(1) as a package).<br />
<br />
<span style="color: #660000;"><span style="font-family: "Courier New", Courier, monospace;">$ echo $PKG_PATH</span><span style="font-family: "Courier New", Courier, monospace;"> </span></span><br />
<span style="color: #660000;"><span style="font-family: "Courier New", Courier, monospace;"> </span><a href="ftp://ftp3.usa.openbsd.org/pub/OpenBSD/5.7/packages/i386/"><span style="font-family: "Courier New", Courier, monospace;">ftp://ftp3.usa.openbsd.org/pub/OpenBSD/5.7/packages/i386/</span></a></span><br />
<br />
<span style="color: #660000;"><span style="font-family: "Courier New", Courier, monospace;">$ sudo pkg_add postgresql-server<br />quirks-2.54 signed on 2015-03-09T11:04:08Z<br />No change in quirks-2.54<br />postgresql-server-9.4.1p1 (extracting)<br />1%<br />1%<br />2%<br />3% ********</span></span><br />
<span style="color: #660000;"><br /></span>
<span style="color: #660000;"><span style="font-family: Courier New;"><etc.></span></span><br />
<span style="color: #660000;"><br /></span>
<span style="color: #660000;"><span style="font-family: Courier New;">100%<br />postgresql-server-9.4.1p1 (installing)<br />0% useradd: Warning: home directory `/var/postgresql' doesn't exist, and -m was not specified<br />postgresql-server-9.4.1p1 (installing)|<br />1%<br />1%<br />2%<br />3% ********</span></span><br />
<span style="color: #660000;"><br /></span>
<span style="color: #660000;"><span style="font-family: Courier New;"><etc.></span><span style="font-family: "Courier New", Courier, monospace;"> </span></span><br />
<span style="color: #660000;"><br /></span>
<span style="color: #660000;"><span style="font-family: "Courier New", Courier, monospace;">100%</span></span><br />
<span style="color: #660000;"><span style="font-family: "Courier New", Courier, monospace;"><br />postgresql-server-9.4.1p1: ok<br />The following new rcscripts were installed: /etc/rc.d/postgresql<br />See rcctl(8) for details.<br />Look in /usr/local/share/doc/pkg-readmes for extra documentation.<br />$</span></span><br />
<br />
Given an internet connection with decent speed, this all goes pretty quickly. The first set of per cent numbers are the download of the gzippped tar package binary, the second are the unzipping and install of the Postgresql binaries in the proper location in the operating system file hierarchy.<br />
<br />
For years I had some trouble getting my head around setting up users for Postgresql and running the daemon. Much of my database experience is as an application user at work using Microsoft SQL Server. We use Windows Authentication there primarily. Working on my own UNIX-based (OpenBSD) home system is a little different.<br />
<br />
Most of the problems I've had overcoming this user/security hump related to my lack of a good strong grasp of UNIX users and permissions (like I could do it in my sleep strong grasp). OpenBSD is a bit unique in that it has a special name for the postgresql unprivileged user: _postgresql. The underscore is a convention in OpenBSD for this general class of user, usually associated with a daemon that runs on startup or gets started by root, doesn't have a login (nor a password). Michael Lucas spends several pages with a good summary of the rational behind this, the history and its conventions in his <a href="http://www.google.com/imgres?imgurl=http://t1.gstatic.com/images%3Fq%3Dtbn:ANd9GcSIoworHG-P2f80pSn8Qq7dpbvVCXey98Ms_ZN6uBbhG6CCWszu&imgrefurl=http://books.google.com/books/about/Absolute_OpenBSD.html?id%3DPN6Xy9zWAbsC%26source%3Dkp_cover&h=648&w=490&tbnid=7aOmmzt0VgInTM:&tbnh=160&tbnw=120&usg=__841XpHUA5XUq7ZAo1JyjV72COtg=&docid=V6uCv9wQxBXMEM&itg=1" target="_blank">authoritative OpenBSD book</a>.<br />
<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiqaRHXtpZz2kaViOKuPIHk81YcehmNTnHVhKLVqV9vwhC3NtAElQ5jEart3p4Ot7kUj4dLZQ7ZrFI8PyOI2pF5n_NlzOqDX_yuiwrLHgYejFGoTUaoIhM1IO22FlXGTROLgdGmBkzNFMs/s1600/AbsoluteOpenBSD.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiqaRHXtpZz2kaViOKuPIHk81YcehmNTnHVhKLVqV9vwhC3NtAElQ5jEart3p4Ot7kUj4dLZQ7ZrFI8PyOI2pF5n_NlzOqDX_yuiwrLHgYejFGoTUaoIhM1IO22FlXGTROLgdGmBkzNFMs/s320/AbsoluteOpenBSD.jpg" width="240" /></a></div>
<br />
So, we want to take a look at the directory designated for Postgresql's data, /var/postgresql:<br />
<span style="color: purple;"><br /><span style="font-family: "Courier New",Courier,monospace;">$ ls -lah /var | grep post</span></span><br />
<span style="color: purple;"><span style="font-family: "Courier New",Courier,monospace;"><br /></span></span>
<span style="color: purple;"><span style="font-family: "Courier New",Courier,monospace;">drwxr-xr-x 2 _postgresql _postgresql 512B May 19 17:52 postgresql</span></span><br />
<span style="color: purple;"><span style="font-family: "Courier New",Courier,monospace;"><br /></span></span>
<span style="color: purple;"><span style="font-family: "Courier New",Courier,monospace;">$ cd postgresql</span></span><br />
<br />
There is no data directory there (just . and .. in the /var/postgresql directory - the 2 in the ls output). This is typically where I would get stuck in the past. I ended up doing it manually . . . and wrong, or at least in a way that was more difficult than necessary. Anyway, I recorded it that way, so I'll blog it as executed.<br /><br />What I had difficulty understanding before was the whole unprivileged user concept. Basically you need to use su to log on as root, then further su to log on as _postgresql:<br /><br /># THIS IS AN UNNECESSARY STEP - DON'T DO THIS<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"><span style="color: purple;">$ su<br />Password:<br /># su - _postgresql<br />$ mkdir /var/postgresql/data<br />$ ls -lah /var/postgresql<br />total 12<br />drwxr-xr-x 3 _postgresql _postgresql 512B Jun 4 19:06 .<br />drwxr-xr-x 23 root wheel 512B May 19 17:52 ..<br />drwxr-xr-x 2 _postgresql _postgresql 512B Jun 4 19:06 data<br />$ exit<br /># exit<br />$</span> </span><br />
<br />
# END UNNECESSARY STEP<br />
<br />
Now I need a database cluster. I want to initialize it with support for UTF-8 because I have some text data with umlauts in it (non-ASCII):<br />
<br />
<span style="color: #660000;"><span style="font-family: "Courier New",Courier,monospace;">$ su<br />Password:<br /># su - _postgresql<br />$ initdb -D /var/postgresql/data -U postgres -A md5 -E UTF8 -W</span></span><br />
<span style="color: #660000;"><span style="font-family: "Courier New",Courier,monospace;"><br />The files belonging to this database system will be owned by user "_postgresql".<br />This user must also own the server process.<br /><br />The database cluster will be initialized with locale "C".<br />The default text search configuration will be set to "english".<br /><br />Data page checksums are disabled.<br /><br /><span style="background-color: yellow;">fixing permissions on existing directory /var/postgresql/data ... ok</span><br />creating subdirectories ... ok<br />selecting default max_connections ... 30<br />selecting default shared_buffers ... 128MB<br />selecting dynamic shared memory implementation ... posix<br />creating configuration files ... ok<br />creating template1 database in /var/postgresql/data/base/1 ... ok<br />initializing pg_authid ... ok<br />Enter new superuser password: <br />Enter it again: <br />setting password ... ok<br />initializing dependencies ... ok<br />creating system views ... ok<br />loading system objects' descriptions ... ok<br />creating collations ... not supported on this platform<br />creating conversions ... ok<br />creating dictionaries ... ok<br />setting privileges on built-in objects ... ok<br />creating information schema ... ok<br />loading PL/pgSQL server-side language ... ok<br />vacuuming database template1 ... ok<br />copying template1 to template0 ... ok<br />copying template1 to postgres ... ok<br />syncing data to disk ... ok<br /><br />Success. You can now start the database server using:<br /><br /> postgres -D /var/postgresql/data<br />or<br /> pg_ctl -D /var/postgresql/data -l logfile start<br /><br />$ exit<br /># exit<br />$ whoami<br />carl<br />$ pwd<br />/home/carl</span></span><br />
<br />A couple things:<br /><br />1) There's a line in the output about fixing permissions on the existing data directory (this will show up as highlighted on the blog, possibly not in the planetpython blog feed) - had I done this correctly (just let initdb make the directory itself), that line would look something like this (I created another cluster while writing the blog just so I would understand how to do it right):<br /><span style="color: #660000;"><span style="font-family: "Courier New",Courier,monospace;"><br />creating directory /var/postgresql/data4 ... ok </span></span><br />
<br />
Right there in the initdb(1) man page: "Creating a database cluster consists of creating the directories in which the database date will live . . ." The man page goes on to explain how to get around permission problems, etc. in this process. Note to self: read the man page . . . carefully.<br />
<br />
2) What I also learned is that you can make as many database clusters as you want, all with different data directories. postgres is the superuser name you see in the documentation and /var/postgresql/data is the directory, but, as demonstrated above in the output, you could put your data in a folder called data4. If you gave a different name at the -U switch in the initdb command, the superuser name would be different too. Or you could have more than one cluster with postgres named superusers but with different passwords.<br />
<br />
All that said, one cluster per physical box and the conventional names are plenty for me - I'm just trying to get used to the Postgresql environment and get started.<br />
<br />
At this point I need to start up the Postgresql daemon. In the package install above, the output mentions an rc script /etc/rc.d/postgresql. This is run by root - below is a demo of using it manually with su (instead of using it as part of an rc startup sequence at boot):<br />
<br />
<span style="color: purple;"><span style="color: #660000;"><span style="font-family: "Courier New",Courier,monospace;">$ su<br />Password:<br /><br /># /etc/rc.d/postgresql start <br />postgresql(ok)<br /># pgrep postgres<br />6960<br />10175<br />4748<br />29053<br />32758<br />26201<br /># /etc/rc.d/postgresql stop <br />postgresql(ok)<br /># pgrep postgres</span></span> </span><br />
<br />
All I did there was start the Postgresql daemon with the installed rc script, check to see that it's associated processes are running, then stop the daemon with the same script.<br />
<br />
Me being me, I can't leave good enough alone. I want the control of starting and stopping the daemon when I decide to (I am running this on a laptop). As I understand it, pg_ctl is a wrapper program provided with the Postgresql install for even more low level commands and functionality. I use pg_ctl to run the daemon and start it with the _postgresql user account:<br />
<br />
<span style="color: #660000;"><span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: "Courier New",Courier,monospace;">$ su<br />Password:</span> </span></span><br />
<span style="color: #660000;"><span style="font-family: "Courier New",Courier,monospace;"># su - _postgresql<br />$ pg_ctl -D /var/postgresql/data -l firstlog start<br />server starting<br />$ exit<br /># exit<br />$</span></span><br />
<br />
I asked pg_ctl to make a specific log file for this session (firstlog - this will go in directory /var/postgresql/). The logs are human readable and I wanted to study them later to see what's going on (there's all kinds of stuff in there about autovacuum and what not - sorry, we're not covering that in this blog post - but I'll have it available later).<br />
<br />
Shutting down (stopping) the daemon is pretty simple with pg_ctl - just a few more keystrokes than if I had done it from root with the rc script:<br />
<br />
<span style="color: purple;"><span style="color: #660000;"><span style="font-family: "Courier New",Courier,monospace;">$ su<br />Password:<br /># su - _postgresql<br />$ pg_ctl -D /var/postgresql/data stop<br />waiting for server to shut down.... done<br />server stopped<br />$ exit<br /># exit<br />$ whoami<br />carl<br />$</span></span> </span> <br />
<br />
Great - so I'm good for getting the daemon going when I want it and for designating my own specific log files per session. Now to create a user and get to work:<br />
<br />
(with daemon running):<br />
<span style="font-family: "Courier New",Courier,monospace;"><br /></span>
<span style="color: #660000;"><span style="font-family: "Courier New",Courier,monospace;">$ psql -U postgres<br /><span style="color: #4c1130;">Password for user postgres:<br />psql (9.4.1)<br />Type "help" for help.<br /><br />postgres=# CREATE ROLE carl SUPERUSER;<br />CREATE ROLE<br />postgres=# ALTER ROLE carl PASSWORD 'xxxxxxxx'<br />ALTER ROLE<br />postgres=# ALTER USER carl PASSWORD 'xxxxxxxx' LOGIN;<br />ALTER ROLE<br />postgres=# \q</span></span></span><br />
<span style="color: #660000;"><span style="font-family: "Courier New",Courier,monospace;">$</span></span><br />
<br />
I created a user/role carl with SUPERUSER capabilities within this instance of Postgresql. It's a bit ugly and I'm not sure I've done this correctly or the easiest way. Also, and of importance, I have given Postgresql user carl (not OpenBSD user carl) all permissions on everything. Really, carl only needs permissions to work on the database he's working on. Josh Drake (@linuxhiker on twitter) pointed this out to me. I am grateful for this. He is right. I am lazy.<br />
<br />
Now to create my database. I got into model trains around Christmas of 2015 and went crazy collecting stuff and setting up a layout. I needed to somehow keep track of all the cars before it all got too unwieldy.<br /><br />
<span style="color: #660000;"><span style="font-family: "Courier New",Courier,monospace;">$ psql postgres carl<br /><span style="color: #4c1130;">Password for user carl:<br />psql (9.4.1)<br />Type "help" for help.<br /><br />postgres=# CREATE DATABASE hotrains;<br />CREATE DATABASE<br />postgres=# \q</span><br />$ </span></span><br />
<br />
The command line entry to start psql is something I'm a bit fuzzy on - postgres isn't, to the best of my knowledge, a database per se, but a means of connecting to psql when you don't want to designate a default database ot work on.<br />
<br />
I'm not going to post the full database code for the sake of brevity - it's only 11 tables but that's a bit much for a blog post. Instead I'll post a graphic schema I made and talk to it a little bit before posting one related SQL code snippet. <br />
<br />Disclaimer: I'm not a designer. This schema diagram I did with <a href="https://en.wikipedia.org/wiki/Dia_%28software%29" target="_blank">Dia</a>, a fairly lightweight Linux/UNIX desktop tool for flowcharts and stuff. I've never met a color palette or font choice I could simply let be. Asking me to do a flowchart with a lot of leeway on design is like leaving a two year old home alone with a Crayola 64 pack of crayons and the 300 year old family Bible - it can't end well.<br /><br />All that said, I find schema diagrams helpful for showing relationships between tables and having an ugly one is better than none at all. I've embedded an svg version of it below; hopefully it shows up on the planetpython feed:<br />
<br />
<br />
<iframe height="550" src="https://drive.google.com/file/d/0B_keTR2WNh2LbV9QdVotRkZaZnM/preview" width="640"></iframe><br />
<br />
The focus of my crude toy database design was the use of foreign keys to maintain consistency in naming things I want to track: rail name for example. I went with "Santa Fe" where I could have went with (and probably should have) "<a href="https://www.youtube.com/watch?v=E7NHWKbQuRw" target="_blank">ATSF</a>." It doesn't matter as long as it's consistent and I know what it means.<br />
<br />
Years ago I was called in to do some work on a blasting database at the mine. There weren't any constraints on the entry of names of blasting materials, but what could go wrong? There were only three or four products with four digit designators and "None." Well . . . it was a mess. I didn't want to take any chances on having a situation like that again, even, or especially, if I was doing all of the data entry. Foreign keys it was!<br />
<br />
Here's a quick dump of the code I used to create the validsidenumbers table. The idea is to make sure the rail line or company name is consistent in all the side number records (yes, I did actually purchase some identical rolling stock with the exact same side numbers - it's a long story):<br /><br /><span style="color: #4c1130;"><span style="font-family: "Courier New",Courier,monospace;">hotrains=# CREATE TABLE validsidenumbers (<br />railnamex varchar(50) REFERENCES validraillines (namex),<br />sidenumber varchar(50),<br />comments text,<br />PRIMARY KEY (railnamex, sidenumber)<br />);<br />CREATE TABLE<br />hotrains=# </span></span><br />
<br />
That REFERENCES keyword sees to it that I won't enter anything typo'd or goofy into that railnamex column.<br />
<br />
Next post is a Python one about pulling storing images of the train cars in the database and displaying them from within psql.<br /><br />Thanks for stopping by.Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com8tag:blogger.com,1999:blog-524230429673765509.post-44783175155398723952015-09-26T19:56:00.000-07:002015-09-27T21:55:30.595-07:00MSSQL sqlcmd -> bcp csv dump -> ExcelA couple months back I had a one-off assignment to dump some data from a vendor provided relational database to a csv file and then from there to Excel (essentially a fairly simple ETL - extract, transform, load exercise). It was a little trickier than I had planned it. Disclaimer: this may not be the best approach, but it worked . . . at least twice . . . on two different computers and that was sufficient.<br />
<br />
<b>Background:</b><br />
<br />
<b>Database:</b> the relational database provided by the vendor is the back end to a graphic mine planning application. It does a good job of storing geologic and mine planning data, but requires a little work to extract the data via SQL queries. <br />
<br />
<b>Weighted Averages:</b> specifically, the queries are required to do tonne-weighted averages and binning. Two areas that I've worked in, mine planning and mineral processing (mineral processing could be considered a subset of metallurgy or chemical engineering), require a lot of work with weighted averages. Many of the database programming examples on line deal with retail and focus one sales in the form of sum of sales by location. The weighted average by tonnes or gallons of flow requires a bit more SQL code.<br />
<br />
<b>Breaking Up the SQL and the CSV Dump Problem:</b> in order to break the weighted average and any associated binning into smaller, manageable chunks of functionality, I used MSSQL (Microsoft SQL Server) global temporary tables in my queries. Having my final result set in one of these global temporary tables allowed me to dump it to a csv file using the MSSQL bcp utility. There are other ways to get a result set and produce a csv file from it with Python. I wanted to isolate as much functionality within the MSSQL database as possible. Also, the bcp utility gives some feedback when it fails - this made debugging or troubleshooting the one off script easier, for me, at least.<br />
<br />
As far as the SQL goes, I may have been able to do this with a single query without too much trouble. There are tools within Transact-SQL for pivoting data and doing the sort of things I naively and crudely do with temporary tables. That said, in real life, the data are seldom this simple and this clean. There are far more permutations and exceptions. The real life version of this problem has fourteen temporary tables versus the four shown here.<br />
<br />
<b>Sanitized Mock Up Scenario:</b> there's no need to go into depth on our vendor's database schema or the specific technical problem - both are a tad complicated. I like doing tonne-weighted averages with code but it's not everyone's cup of tea. In the interest of simplifying this whole thing and making it more fun, I've based it on the old <a href="https://en.wikipedia.org/wiki/The_Devil_in_the_Dark" target="_blank">Star Trek Episode Devil in the Dark</a> about an underground mine on a distant planet.<br />
<br />
<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfIa9f1XPZFWAPtE_VgAwhq9a9Ax7XQER9_oOFpBXoPvXcbMlHaNBhw04LRJaRKxflDYtqDOFlfb1XwTZ5vzhzmadCKnVwR8YtcLOP_3l0Sj_AKTqgkhVNYoGDddtbDqyyBuGiBcuYUdA/s1600/kirkfacesthehorta.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="247" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfIa9f1XPZFWAPtE_VgAwhq9a9Ax7XQER9_oOFpBXoPvXcbMlHaNBhw04LRJaRKxflDYtqDOFlfb1XwTZ5vzhzmadCKnVwR8YtcLOP_3l0Sj_AKTqgkhVNYoGDddtbDqyyBuGiBcuYUdA/s320/kirkfacesthehorta.JPG" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<b>Mock Data:</b> we're modeling mined out areas and associated tonnages of rock bearing pergium, gold, and platinum in economic concentrations. (I don't know what pergium is, but it was worth enough that going to war with Mother Horta seemed like a good idea). Here is some code to create the tables and fill in the data (highly simplified schema - each mined out area is a "cut").</div>
<br />
SQL Server 2008 R2 (Express) - table creation and mock data SQL code . I'm not showing the autogenerated db creation code - it's lengthly - suffice it to say the database name is JanusVIPergiumMine. Also, there are no keys in the tables for the sake of simplicity.<br />
<br />
<span style="color: #660000; font-family: "Courier New", Courier, monospace;">USE JanusVIPergiumMine;</span><br />
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: #660000; font-family: "Courier New", Courier, monospace;">CREATE TABLE cuts (<br /> cutid INT,<br /> cutname VARCHAR(50),<br /> monthx VARCHAR(30),<br /> yearx INT);</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: #660000; font-family: "Courier New", Courier, monospace;">CREATE TABLE cutattributes (<br /> cutid INT,<br /> attributex VARCHAR(50),<br /> valuex VARCHAR(50));</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: #660000; font-family: "Courier New", Courier, monospace;">CREATE TABLE tonnes(<br /> cutid INT NULL,<br /> tonnes FLOAT);</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: #660000; font-family: "Courier New", Courier, monospace;">CREATE TABLE dbo.gradesx(<br /> cutid int NULL,<br /> gradename varchar(50) NULL,<br /> gradex float NULL);</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: #660000; font-family: "Courier New", Courier, monospace;">DELETE FROM cuts;</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: #660000; font-family: "Courier New", Courier, monospace;">INSERT INTO cuts<br /> VALUES (1, 'HappyPergium1', 'April', 2015),<br /> (2, 'HappyPergium12', 'April', 2015),<br /> (3, 'VaultofTomorrow1', 'April', 2015),<br /> (4, 'VaultofTomorrow2', 'April', 2015),<br /> (5, 'Children1', 'April', 2015),<br /> (6, 'Children2', 'April', 2015),<br /> (7, 'VandenbergsFind1', 'April', 2015),<br /> (8, 'VandenbergsFind2', 'April', 2015);</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: #660000; font-family: "Courier New", Courier, monospace;">DELETE FROM cutattributes;</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: #660000; font-family: "Courier New", Courier, monospace;">INSERT INTO cutattributes<br /> VALUES (1, 'Drift', 'Level23East'),<br /> (2, 'Drift', 'Level23East'),<br /> (3, 'Drift', 'Level23West'),<br /> (4, 'Drift', 'Level23West'),<br /> (5, 'Drift', 'BabyHortasCutEast'),<br /> (6, 'Drift', 'BabyHortasCutEast'),<br /> (7, 'Drift', 'BabyHortasCutWest'),<br /> (8, 'Drift', 'BabyHortasCutWest');</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: #660000; font-family: "Courier New", Courier, monospace;">DELETE FROM tonnes;</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: #660000; font-family: "Courier New", Courier, monospace;">INSERT INTO tonnes<br /> VALUES (1, 28437.0),<br /> (2, 13296.0),<br /> (3, 13222.0),<br /> (4, 6473.0),<br /> (5, 6744.0),<br /> (6, 8729.0),<br /> (7, 10030.0),<br /> (8, 2345.0);</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: #660000; font-family: "Courier New", Courier, monospace;">DELETE FROM gradesx;</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: #660000; font-family: "Courier New", Courier, monospace;">INSERT INTO gradesx<br /> VALUES (1, 'Au g/tonne', 6.44),<br /> (1, 'Pt g/tonne', 0.54),<br /> (1, 'Pergium g/tonne', 15.23),<br /> (2, 'Au g/tonne', 7.83),<br /> (2, 'Pt g/tonne', 0.77),<br /> (2, 'Pergium g/tonne', 4.22),<br /> (3, 'Au g/tonne', 0.44),<br /> (3, 'Pt g/tonne', 3.54),<br /> (3, 'Pergium g/tonne', 2.72),<br /> (4, 'Au g/tonne', 0.87),<br /> (4, 'Pt g/tonne', 2.87),<br /> (4, 'Pergium g/tonne', 1.11),<br /> (5, 'Au g/tonne', 12.03),<br /> (5, 'Pt g/tonne', 0.33),<br /> (5, 'Pergium g/tonne', 10.01),<br /> (6, 'Au g/tonne', 8.72),<br /> (6, 'Pt g/tonne', 1.38),<br /> (6, 'Pergium g/tonne', 5.44),<br /> (7, 'Au g/tonne', 7.37),<br /> (7, 'Pt g/tonne', 1.59),<br /> (7, 'Pergium g/tonne', 4.05),<br /> (8, 'Au g/tonne', 3.33),<br /> (8, 'Pt g/tonne', 0.98),<br /> (8, 'Pergium g/tonne', 3.99);</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: inherit;"><b>Python Code to Run the Dump/ETL to CSV:</b> this is essentially a series of os.system calls to MSSQL's sqlcmd and bcp. What made this particularly brittle and hairy is the manner in which the lifetime of temporary tables is determined in MSSQL. To get the temporary table with my results to persist, I had to wrap its creation inside a process. I'm ignorant as to the internal workings of buffers and memory here, but the MSSQL sqlcmd commands do not execute or write to disk exactly when you might expect them to. Nothing is really completed until the process hosting sqlcmd is killed.</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
At work I actually got the bcp format file generated on the fly - I wasn't able to reproduce this behavior for this mock exercise. Instead, I generated a bcp format file for the target table dump "by hand" and put the file in my working directory.</div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
As I show further on, this SQL data dump will be run from a button within an Excel spreadsheet.</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: inherit;">Mr. Spock, or better said, Horta Mother says it best:</span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: inherit;"><br /><br /><span style="font-size: large;"><i><b>Subprocesses, sqlcmd, bcp, Excel . . .<br /><br />PAAAAAIIIIIIIN!</b></i></span></span></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgW453c7IpltRAuViE3xehwuiEfGBL8ZMzes3rMulJtPnCorwrMjdMDhADILDuIcALnmfID4kkGBSi4FieN-0cMlvbBj5Cgu1kjycYYIY7QNr9xSKLtUi_ntOJVNf4oFTFvx75CBakyrds/s1600/pain.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="218" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgW453c7IpltRAuViE3xehwuiEfGBL8ZMzes3rMulJtPnCorwrMjdMDhADILDuIcALnmfID4kkGBSi4FieN-0cMlvbBj5Cgu1kjycYYIY7QNr9xSKLtUi_ntOJVNf4oFTFvx75CBakyrds/s320/pain.JPG" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">#!C:\Python34\python</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: Courier New;"># blogsqlcmdpull.py</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># XXX<br /># Changed my laptop's name to MYLAPTOP.</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: Courier New;"># Yours will be whatever your computer</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: Courier New;"># name is.</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">import os<br />import subprocess as subx<br />import shlex<br />import time</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">import argparse</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># Need to make sure you are in proper Windows directory.<br /># Can vary from machine to machine based on<br /># environment variables.</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># Googled StackOverflow.<br /># 5137497/find-current-directory-and-files-directory<br />EXCELDIR = os.path.dirname(os.path.realpath(__file__))<br />os.chdir(EXCELDIR)<br />print('\nCurrent directory is {:s}'.format(os.getcwd()))</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">parser = argparse.ArgumentParser()<br /># 7 digit argument like 'Apr2015'<br /># Feed in at command line<br />parser.add_argument('monthyear',<br /> help='seven digit, month abbreviation (Apr2015)',<br /> type=str)<br />args = parser.parse_args()<br />MONTHYEAR = args.monthyear</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># Use Peoplesoft/company id so that more than<br /># one user can run this at once if necessary<br /># (note: will not work if one user tries to<br /># run multiple instances at the same<br /># time - theoretically <not tested><br /># tables will get mangled and data<br /># will be corrupt.)<br />USER = os.getlogin()</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">CSVDUMPNAME = 'csvdumpname'<br />CSVDUMP = 'nohandjamovnumbersbcp'</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">CSVEXT = '.csv'</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">HOMESERVERNAME = 'homeservername'<br />LOCALSERVER = r'MYLAPTOP\SQLEXPRESS'</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">USERNAME = 'username'</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># Need to fill in month, year<br /># with input from Excel spreadsheet.<br />QUERYDICT = {'month':"'{:s}'",<br /> 'year':0,<br /> USERNAME:USER}</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># For sqlcmd and bcp<br />ERRORFILENAME = 'errorfilename'<br />STDOUTFILENAME = 'stdoutfilename'<br />ERRX = 'sqlcmderroutput.txt'<br />STDOUTX = 'sqcmdoutput.txt'<br />EXIT = '\nexit\n'<br />UTF8 = 'utf-8'<br />GOX = '\nGO\n'</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># 2 second pause.<br />PAUSEX = 2</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">SLEEPING = '\nsleeping {pause:d} seconds . . .\n'</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># XXX - Had to generate this bcp format file<br /># from table in MSSQL Management Studio -<br /># dos command line:<br /># bcp ##TARGETX format nul -f test.fmt -S MYLAPTOP\SQLEXPRESS -t , -c -T</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: "Courier New", Courier, monospace;"><br /><span style="color: blue;"># XXX - you can programmatically extract<br /># column names from the bcp format<br /># file or<br /># you can dump them from SQLServer<br /># with a separate query in bcp - <br /># I have done neither here<br /># (I hardcoded them).<br />FMTFILE = 'formatfile'<br />COLBCPFMTFILE = 'bcp.fmt'</span></span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">CMDLINEDICT = {HOMESERVERNAME:LOCALSERVER,<br /> 'exit':EXIT,<br /> CSVDUMPNAME:CSVDUMP,<br /> ERRORFILENAME:ERRX,<br /> STDOUTFILENAME:STDOUTX,<br /> 'go':GOX,<br /> USERNAME:USER,<br /> 'pause':PAUSEX,<br /> FMTFILE:COLBCPFMTFILE}</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue;"><span style="font-family: "Courier New", Courier, monospace;"># Startup for sqlcmd interactive mode.<br />SQLPATH = r'C:\Program Files\Microsoft SQL Server'<br />SQLPATH += r'\100\Tools\Binn\SQLCMD.exe'<br />SQLCMDEXE = [SQLPATH]<br />SQLCMDARGS </span><span style="font-family: "Courier New", Courier, monospace;">= shlex.split(</span></span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"> ('-S{homeservername:s}'.format**CMDLINEDICT)),<br /> posix=False)<br />SQLCMDEXE.extend(SQLCMDARGS)</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">BCPSTR = ':!!bcp "SELECT * FROM ##TARGETX{username:s};" '<br />BCPSTR += 'queryout {csvdumpname:s}.csv -t , '<br />BCPSTR += '-f {formatfile:s} -S {homeservername:s} -T'<br />BCPSTR = BCPSTR.format(**CMDLINEDICT)</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">def cleanslate():<br /> """<br /> Delete files from previous runs.<br /> """<br /> # XXX - only one file right now.<br /> files = [CSVDUMP + CSVEXT]<br /> for filex in files:<br /> if os.path.exists(filex) and os.path.isfile(filex):<br /> os.remove(filex)<br /> return 0</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">MONTHS = {'Jan':'January',<br /> 'Feb':'February',<br /> 'Mar':'March',<br /> 'Apr':'April',<br /> 'May':'May',<br /> 'Jun':'June',<br /> 'Jul':'July',<br /> 'Aug':'August',<br /> 'Sep':'September',<br /> 'Oct':'October',<br /> 'Nov':'November',<br /> 'Dec':'December'}</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">def parseworkbookname():<br /> """<br /> Get month (string) and year (integer)<br /> from name of workbook (Apr2015).</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"> Return as month, year 2 tuple.<br /> """<br /> # XXX<br /> # Write this out - will eventually <br /> # need error checking/try-catch<br /> monthx = MONTHS[MONTHYEAR[:3]]<br /> yearx = int(MONTHYEAR[3:])<br /> return monthx, yearx</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># Global Temporary Tables<br />TONNESTEMPTBL = """<br />CREATE TABLE ##TONNES{username:s} (<br /> yearx INT,<br /> monthx VARCHAR(30),<br /> cutid INTEGER,<br /> drift VARCHAR(30),<br /> tonnes FLOAT);<br />"""</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">FILLTONNES = """<br />USE JanusVIPergiumMine;</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">DECLARE @DRIFT CHAR(5) = 'Drift';</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">INSERT INTO ##TONNES{username:s}<br /> SELECT cutx.yearx,<br /> cutx.monthx,<br /> cutx.cutid,<br /> cutattrx.valuex AS drift,<br /> tonnesx.tonnes<br /> FROM cuts cutx<br /> INNER JOIN cutattributes cutattrx<br /> ON cutx.cutid = cutattrx.cutid<br /> INNER JOIN tonnes tonnesx<br /> ON cutx.cutid = tonnesx.cutid<br /> WHERE cutx.yearx = {year:d} AND<br /> cutx.monthx = {month:s} AND<br /> cutattrx.attributex = @DRIFT;<br />"""</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">GRADESTEMPTBL = """<br />CREATE TABLE ##GRADES{username:s} (<br /> cutid INTEGER,<br /> drift VARCHAR(30),<br /> gradenamex VARCHAR(50),<br /> graden FLOAT);</span><span style="font-family: "Courier New", Courier, monospace;"><br /><span style="color: blue;">"""</span></span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">FILLGRADES = """<br />USE JanusVIPergiumMine; </span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">DECLARE @DRIFT CHAR(5) = 'Drift';</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">INSERT INTO ##GRADES{username:s}<br /> SELECT cutx.cutid,<br /> cutattrx.valuex AS drift,<br /> gradesx.gradename,<br /> gradesx.gradex<br /> FROM cuts cutx<br /> INNER JOIN cutattributes cutattrx<br /> ON cutx.cutid = cutattrx.cutid<br /> INNER JOIN gradesx<br /> ON cutx.cutid = gradesx.cutid<br /> WHERE cutx.yearx = {year:d} AND<br /> cutx.monthx = {month:s} AND<br /> cutattrx.attributex = @DRIFT;<br />"""</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># Sum and tonne-weighted averages<br />MONTHLYPRODDATASETTEMPTBL = """<br />CREATE TABLE ##MONTHLYPRODDATASET{username:s} (<br /> yearx INT,<br /> monthx VARCHAR(30),<br /> drift VARCHAR(30),<br /> tonnes FLOAT,<br /> gradename VARCHAR(50),<br /> grade FLOAT);<br />"""</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">FILLMONTHLYPRODDATASET = """<br />INSERT INTO ##MONTHLYPRODDATASET{username:s}<br /> SELECT tonnesx.yearx,<br /> tonnesx.monthx,<br /> tonnesx.drift,<br /> SUM(tonnesx.tonnes) AS tonnes,<br /> gradesx.gradenamex AS gradename,<br /> SUM(tonnesx.tonnes * gradesx.graden)/<br /> SUM(tonnesx.tonnes) AS graden<br /> FROM ##TONNES{username:s} tonnesx<br /> INNER JOIN ##GRADES{username:s} gradesx<br /> ON tonnesx.cutid = gradesx.cutid<br /> GROUP BY tonnesx.yearx,<br /> tonnesx.monthx,<br /> tonnesx.drift,<br /> gradesx.gradenamex;<br />"""</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># Pivot<br />TARGETXTEMPTBL = """<br />CREATE TABLE ##TARGETX{username:s} (<br /> yearx INT,<br /> monthx VARCHAR(30),<br /> drift VARCHAR(30),<br /> tonnes FLOAT,<br /> pergium FLOAT,<br /> Au FLOAT,<br /> Pt FLOAT);<br />"""</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">FILLTARGETX = """<br />DECLARE @PERGIUM CHAR(15) = 'Pergium g/tonne';<br />DECLARE @GOLD CHAR(10) = 'Au g/tonne';<br />DECLARE @PLATINUM CHAR(10) = 'Pt g/tonne';</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">INSERT INTO ##TARGETX{username:s}<br /> SELECT mpds.yearx,<br /> mpds.monthx,<br /> mpds.drift,<br /> MAX(mpds.tonnes) AS tonnes,<br /> MAX(perg.grade) AS pergium,<br /> MAX(au.grade) AS Au,<br /> MAX(pt.grade) AS Pt<br /> FROM ##MONTHLYPRODDATASET{username:s} mpds<br /> INNER JOIN ##MONTHLYPRODDATASET{username:s} perg<br /> ON perg.drift = mpds.drift AND<br /> perg.gradename = @PERGIUM<br /> INNER JOIN ##MONTHLYPRODDATASET{username:s} au<br /> ON au.drift = mpds.drift AND<br /> au.gradename = @GOLD<br /> INNER JOIN ##MONTHLYPRODDATASET{username:s} pt<br /> ON pt.drift = mpds.drift AND<br /> pt.gradename = @PLATINUM<br /> GROUP BY mpds.yearx,<br /> mpds.monthx,<br /> mpds.drift<br /> ORDER BY mpds.drift;<br />"""</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># 1) Create global temp tables.<br /># 2) Fill global temp tables.<br /># 3) Get desired result set into the target global temp table.<br /># 4) Run bcp against target global temp table.<br /># 5) Drop global temp tables.<br />CREATETABLES = {1:TONNESTEMPTBL,<br /> 2:GRADESTEMPTBL,<br /> 3:MONTHLYPRODDATASETTEMPTBL,<br /> 4:TARGETXTEMPTBL}</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">FILLTABLES = {1:FILLTONNES,<br /> 2:FILLGRADES,<br /> 3:FILLMONTHLYPRODDATASET,<br /> 4:FILLTARGETX}</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">def getdataincsvformat():<br /> """<br /> Retrieve data from MSSQL server.<br /> Dump into csv text file.<br /> """<br /> numtables = len(CREATETABLES)<br /> with open('{errorfilename:s}'.format(**CMDLINEDICT), 'w') as e:<br /> with open('{stdoutfilename:s}'.format(**CMDLINEDICT), 'w') as f:<br /> sqlcmdproc = subx.Popen(SQLCMDEXE, stdin=subx.PIPE,<br /> stdout=f, stderr=e)<br /> for i in range(numtables):<br /> cmdx = (CREATETABLES[i + 1]).format(**QUERYDICT)<br /> print(cmdx)<br /> sqlcmdproc.stdin.write(bytes(cmdx +<br /> '{go:s}'.format(**CMDLINEDICT), UTF8))<br /> print(SLEEPING.format(**CMDLINEDICT))<br /> time.sleep(PAUSEX)<br /> for i in range(numtables):<br /> cmdx = (FILLTABLES[i + 1]).format(**QUERYDICT)<br /> print(cmdx)<br /> sqlcmdproc.stdin.write(bytes(cmdx +<br /> '{go:s}'.format(**CMDLINEDICT), UTF8))<br /> print(SLEEPING.format(**CMDLINEDICT))<br /> time.sleep(PAUSEX)<br /> print('bcp csv dump command (from inside sqlcmd) . . .')<br /> sqlcmdproc.stdin.write(bytes(BCPSTR, UTF8))<br /> print(SLEEPING.format(**CMDLINEDICT))<br /> time.sleep(PAUSEX)<br /> sqlcmdproc.stdin.write(bytes('{exit:s}'.format(**CMDLINEDICT), UTF8))<br /> return 0</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;"> </span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">monthx, yearx = parseworkbookname()</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: "Courier New", Courier, monospace;"><br /><span style="color: blue;"># Get rid of previous files.<br />print('\ndeleting files from previous runs . . .\n')<br />cleanslate()</span></span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: "Courier New", Courier, monospace;"><br /><span style="color: blue;"># Get month and year into query dictionary.<br />QUERYDICT['month'] = QUERYDICT['month'].format(monthx)<br />QUERYDICT['year'] = yearx</span></span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: "Courier New", Courier, monospace;"><br /><span style="color: blue;">getdataincsvformat()</span></span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: "Courier New", Courier, monospace;"><br /><span style="color: blue;">print('done')</span></span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div style="clear: both; text-align: justify;">
<span style="color: black; font-family: inherit;">It's ugly, but it works.<br /><br />Keeping with the Horta theme, this would be a good spot for an image break:</span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirRW7DzLNG0ID1SosYEfZes0KpKkWqsDrHRBtEL8NrhBKpUODuuLuThS5V8mLaizfq90W80SQavySGvCeOLGxzsVSDush1gSqpPlO44kvMACxZYaZ2eTkE7cz8YkLA1Loh5ypT3HmdGFg/s1600/bricklayer.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="227" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirRW7DzLNG0ID1SosYEfZes0KpKkWqsDrHRBtEL8NrhBKpUODuuLuThS5V8mLaizfq90W80SQavySGvCeOLGxzsVSDush1gSqpPlO44kvMACxZYaZ2eTkE7cz8YkLA1Loh5ypT3HmdGFg/s320/bricklayer.JPG" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<b><i>Damnit, Jim, I'm a geologist not a database programmer.<br /><br />You're an analyst, analyze.</i></b></div>
<div align="justify" class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<b>Load to Excel:</b> this is fairly straightforward - COM programming with Mark Hammond and company's venerable win32com. The only working version of the win32com library I had on my laptop on which I am writing this blog entry was for a Python 2.5 release that came with an old version of our mine planning software (<a href="http://www.minesight.com/" target="_blank">MineSight/Hexagon</a>) - the show must go on!</div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="color: blue; font-family: "Courier New", Courier, monospace;">#!C:\MineSight\mpython</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># blognohandjamnumberspython2.5.py</span><br />
<span style="color: blue; font-family: Courier New;"></span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># mpython is Python 2.5 on this machine.</span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># Had to remove collections.namedtuple<br /># (used dictionary instead) and new<br /># string formatting (reverted to use<br /># of ampersand for string interpolation).</span><br />
<span style="color: blue; font-family: Courier New;"># Lastly, did not have argparse at my</span><br />
<span style="color: blue; font-family: Courier New;"># disposal.</span><br />
<span style="color: blue; font-family: Courier New;"></span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;">from __future__ import with_statement</span><br />
<span style="color: blue; font-family: Courier New;"></span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;">"""<br />Get numbers into spreadsheet<br />without having to hand jam<br />everything.<br />"""</span><br />
<span style="color: blue;"></span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;">import os<br />from win32com.client import Dispatch</span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># Plan on receiving Excel file's<br /># path from call from Excel workbook.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"><br /><span style="color: blue;">import sys</span></span><br />
<span style="color: blue; font-family: Courier New;"></span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># Path to Excel workbook.<br />WB = sys.argv[1]<br /># Worksheet name.<br />WSNAME = sys.argv[2]</span><br />
<span style="color: blue; font-family: Courier New;"></span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;">BACKSLASH = '\\'</span><br />
<span style="font-family: "Courier New", Courier, monospace;"><br /><span style="color: blue;"># Looking for data file in current directory.<br /># (same directory as Python script)<br />CSVDUMP = 'nohandjamovnumbersbcp.csv'</span></span><br />
<span style="font-family: "Courier New", Courier, monospace;"><br /><span style="color: blue;"># XXX - repeated code from data dump file.<br />CURDIR = os.path.dirname(os.path.realpath(__file__))<br />os.chdir(CURDIR)<br />print('\nCurrent directory is %s' % os.getcwd())</span></span><br />
<span style="font-family: "Courier New", Courier, monospace;"><br /><span style="color: blue;"># XXX - I think there's a more elegant way to<br /># do this path concatenation with os.path.<br />CSVPATH = CURDIR + BACKSLASH + CSVDUMP</span></span><br />
<span style="color: blue; font-family: Courier New;"></span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># Fields in csv dump.<br />YEARX = 'yearx'<br />MONTHX = 'monthx'<br />DRIFT = 'drift'<br />TONNES = 'tonnes'<br />PERGIUM = 'pergium'<br />GOLD = 'Au'<br />PLATINUM = 'Pt'</span><br />
<span style="color: blue; font-family: Courier New;"></span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;">FIELDS = [YEARX,<br /> MONTHX,<br /> DRIFT,<br /> TONNES,<br /> PERGIUM,<br /> GOLD,<br /> PLATINUM]</span><br />
<span style="color: blue; font-family: Courier New;"></span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># Excel cells.<br /># Map this to csv dump and brute force cycle to fill in.<br />ROWCOL = '%s%d'</span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;">COLUMNMAP = dict((namex, colx) for namex, colx in<br /> zip(FIELDS, ['A', 'B', 'C', 'D',<br /> 'E', 'F', 'G']))</span><br />
<span style="color: blue; font-family: Courier New;"></span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;">EXCELX = 'Excel.Application'</span><br />
<span style="color: blue; font-family: Courier New;"></span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;">def getcsvdata():<br /> """<br /> Puts csv data (CMP dump) into<br /> a list of data structures<br /> and returns list.<br /> """<br /> with open(CSVPATH, 'r') as f:<br /> records = []<br /> for linex in f:<br /> # XXX - print for debugging/information<br /> print([n.strip() for n in linex.split(',')])<br /> records.append(dict(zip(FIELDS,<br /> (n.strip() for n<br /> in linex.split(',')))))<br /> return records</span><br />
<span style="color: blue; font-family: Courier New;"></span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;"># Put Excel stuff here.<br />def getworkbook(workbooks):<br /> """<br /> Get handle to desired workbook<br /> """<br /> for x in workbooks:<br /> print(x.FullName)<br /> if x.FullName == WB:<br /> # XXX - debug/information print statement<br /> print('EUREKA')<br /> break<br /> return x</span><br />
<span style="color: blue; font-family: Courier New;"></span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;">def fillinspreadsheet(records):<br /> """<br /> Fill in numbers in spreadsheet.</span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;"> Side effect function.</span><br />
<span style="color: blue; font-family: "Courier New", Courier, monospace;"> records is a list of named tuples.<br /> """<br /> excelx = Dispatch(EXCELX)<br /> wb = getworkbook(excelx.Workbooks)<br /> ws = wb.Worksheets.Item(WSNAME)<br /> # Start entering data at row 4.<br /> row = 4<br /> for recordx in records:<br /> for x in FIELDS:<br /> column = COLUMNMAP[x]<br /> valuex = recordx[x]<br /> cellx = ws.Range(ROWCOL % (column, row))<br /> # Selection makes pasting of new value visible.<br /> # I like this - not everyone does. YMMV<br /> cellx.Select()<br /> cellx.Value = valuex<br /> # On to the next record on the next row.<br /> row += 1<br /> # Come back to origin of worksheet at end.<br /> ws.Range('A1').Select()<br /> return 0<br /> <br />cmprecords = getcsvdata()<br />fillinspreadsheet(cmprecords)</span><br />
<span style="color: blue;"><span style="font-family: "Courier New", Courier, monospace;">print('done')</span></span><br />
<span style="color: blue; font-family: Courier New;"></span><br />
<span style="color: black; font-family: inherit;">On to the VBA code inside the Excel spreadsheet (macros) that execute the Python code:<br /><br /><span style="color: #20124d; font-family: "Courier New", Courier, monospace;">Option Explicit</span></span><br />
<span style="color: #20124d; font-family: "Courier New", Courier, monospace;"></span><br />
<span style="color: #20124d; font-family: "Courier New", Courier, monospace;">Const EXECX = "C:\Python34\python "<br />Const EXECXII = "C:\MineSight\mpython\python\2.5\python "<br />Const EXCELSCRIPT = "blognohandjamnumberspython2.5.py "<br />Const SQLSCRIPT = "blogsqlcmdpull.py "</span><br />
<br />
<span style="color: #20124d; font-family: "Courier New", Courier, monospace;">Sub FillInNumbers()</span><br />
<span style="color: #20124d; font-family: "Courier New", Courier, monospace;"></span><br />
<span style="color: #20124d; font-family: "Courier New", Courier, monospace;"> Dim namex As String<br /> Dim wb As Workbook<br /> Dim ws As Worksheet<br /> <br /> Dim longexecstr As String<br /> <br /> Set ws = Selection.Worksheet<br /> 'Try to get current worksheet name to feed values to query.<br /> namex = ws.Name<br /> <br /> longexecstr = EXECXII & " " & ActiveWorkbook.Path<br /> longexecstr = longexecstr & Chr(92) & EXCELSCRIPT<br /> longexecstr = longexecstr & ActiveWorkbook.Path & Chr(92) & ActiveWorkbook.Name<br /> longexecstr = longexecstr & " " & namex</span><br />
<span style="color: #20124d; font-family: "Courier New", Courier, monospace;"> VBA.Interaction.Shell longexecstr, vbNormalFocus<br /> <br />End Sub</span><br />
<br />
<span style="color: #20124d; font-family: "Courier New", Courier, monospace;">Sub GetSQLData()</span><br />
<span style="color: #20124d; font-family: "Courier New", Courier, monospace;"> Dim namex As String<br /> Dim ws As Worksheet<br /> <br /> Set ws = Selection.Worksheet<br /> 'Try to get current worksheet name to feed values to query.<br /> namex = ws.Name</span><br />
<span style="color: #20124d; font-family: "Courier New", Courier, monospace;"> VBA.Interaction.Shell EXECX & ActiveWorkbook.Path & _<br /> Chr(92) & SQLSCRIPT & namex, vbNormalFocus<br /> <br />End Sub</span><br />
<span style="color: #20124d; font-family: Courier New;"></span><br />
<span style="color: black; font-family: inherit;">I always use Option Explicit in my VBA code - that's not particularly pythonic, but being pythonic inside the VBA interpreter can be hazardous. As always, YMMV.<br /><br />Lastly, a rough demo and a data check. We'll run the SQL dump from the top button on the Excel worksheet:</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjF8yZG9_gZAMAwm6V9utQAG0PD8ZRjA5TsMQs6IbCGtldbqZyYYXvFe1YuCiSWgEuDkVCge0d7zxwP7fniQuLfBaiPr6U9HfggyFmYDaY_tZm4bRLIUGhdbJUhAChzx3NKQDGHWLrYkrA/s1600/sqldump.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="468" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjF8yZG9_gZAMAwm6V9utQAG0PD8ZRjA5TsMQs6IbCGtldbqZyYYXvFe1YuCiSWgEuDkVCge0d7zxwP7fniQuLfBaiPr6U9HfggyFmYDaY_tZm4bRLIUGhdbJUhAChzx3NKQDGHWLrYkrA/s640/sqldump.JPG" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div align="justify" class="separator" style="clear: both; text-align: center;">
<br /></div>
<div align="justify" class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
And now we'll run the lower button to put the data into the spreadsheet. It's probably worth noting here that I did not bother doing any type conversions on the text coming out of the SQL csv dump in my Python code. That's because Excel handles that for you. It's not free software (Excel/Office) - might as well get your money's worth.</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVep0PrTyEl0nQxRZRbNLcNgAjBlDbwY8lf4Mk6zA-lZSg2dxzJsj6u6u8ROL_53nPdt-cGVz1poWGq60VwMc0WNs0KY7a09PgxmWSAyslxtQ00d2XVpK6Pm-jnEyh9HoAmrzTWfZY3Ug/s1600/results.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="314" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVep0PrTyEl0nQxRZRbNLcNgAjBlDbwY8lf4Mk6zA-lZSg2dxzJsj6u6u8ROL_53nPdt-cGVz1poWGq60VwMc0WNs0KY7a09PgxmWSAyslxtQ00d2XVpK6Pm-jnEyh9HoAmrzTWfZY3Ug/s640/results.jpg" width="640" /></a></div>
<br />
We'll do a check on the first row for tonnes and a pergium grade. Going back to our original data:<br />
<br />
Cuts 1 and 2 belong to the drift Level23East.<br />
<br />
Tonnes:<br />
<br />
<span style="font-family: "Courier New", Courier, monospace;">VALUES (1, 28437.0),<br /> (2, 13296.0),</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: Courier New;">Total: 41733</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: inherit;">Looks good, we know we got a sum of tonnes right. Now the tonne-weighted average:</span><br />
<br />
Pergium:<br />
<br />
<span style="font-family: "Courier New", Courier, monospace;">(1, 'Pergium g/tonne', 15.23),<br />(2, 'Pergium g/tonne', 4.22),</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: Courier New;">(28437 * 15.23 + 13296 * 4.22)/41733 = 11.722</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: inherit;">It checks out. Do a few more checks and send it out to the Janus VI Pergium Mine mine manager.<br /><br /><b>Notes:</b></span><br />
<b></b><br />
This is a messy one-off mousetrap. That said, this is often how the sausage gets made in a non-programming, non-professional development environment. We do have an in-house Python developer Lori. Often she's given something like this and told to clean it up and make it into an in-house app. That's challenging. Ideally, the mining professional writing the one-off and the dev get together and cross-educate vis a vis the domain space (mining) and the developer space (programming, good software design and practice). It's a lot of fun but the first go around is seldom pretty.<br />
<br />
Thanks for stopping by.<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4Ninok3aRcuM54aUc-aWrLXihtrYQ6zGiEbxcVKxB2k6ALpF_Jlv2ZTYhg9EkjqEAuvWTWIDNcMfFsfU-XWutMk1JeYlOalu5knIr1fQR9rg6Wam2SKur5DY0yglpBp-1d9eOLADprdU/s1600/shelikedmyears.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="223" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4Ninok3aRcuM54aUc-aWrLXihtrYQ6zGiEbxcVKxB2k6ALpF_Jlv2ZTYhg9EkjqEAuvWTWIDNcMfFsfU-XWutMk1JeYlOalu5knIr1fQR9rg6Wam2SKur5DY0yglpBp-1d9eOLADprdU/s400/shelikedmyears.JPG" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<b><span style="font-size: large;">Leonard Nimoy</span></b></div>
<div class="separator" style="clear: both; text-align: center;">
<b><span style="font-size: large;">1931 - 2015</span></b></div>
Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com1tag:blogger.com,1999:blog-524230429673765509.post-48115020108485586732015-05-17T08:34:00.000-07:002015-05-17T08:34:03.355-07:00Lenovo Thinkpad X201 Fan ReplacementThis is not a Python-related post per se, but it may be useful to people getting started with UNIX-based, open source software, or even a Windows user who happens to be using a Thinkpad X201 laptop.<br /><br />Background:<br />
<br />
1) I use OpenBSD as my operating system because I am striving to learn UNIX and I find that distro the best for me for that purpose.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_9yehr7kQUSg8nmGa-S7rrhuKnUr1uDvda7daLB5Pw2r9VkGJE87dImm_G8cC2Ke6eSzfnxNwByO-6vzz8AhQyznLFCq0vSsbIgZmaCv_q5xC_b9olj7Ho1Asc91LoCO_UvKiwMrSN24/s1600/ball-big-fish.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_9yehr7kQUSg8nmGa-S7rrhuKnUr1uDvda7daLB5Pw2r9VkGJE87dImm_G8cC2Ke6eSzfnxNwByO-6vzz8AhQyznLFCq0vSsbIgZmaCv_q5xC_b9olj7Ho1Asc91LoCO_UvKiwMrSN24/s320/ball-big-fish.jpg" width="320" /></a></div>
<br />
<br />
<br />
<br />
<br />
2) The venerable legacy IBM/current Lenovo Thinkpad line of laptops tends to be one of the best supported by OpenBSD and other BSD development communities (small, but loyal dev and user base).<br />
<br />
3) I buy my Thinkpads refurb'd because they're cheaper that way.<br />
<br />
4) Laptop parts only last so long before they start failing, more so with refurbished ones. It was easy to replace the hard drive; the fan is a bit more complicated in terms of disassembling the laptop.<br />
<br />
5) I'm a bit mechanically challenged and tend to break things permanently when trying to fix them. This post hopefully will serve to help others overcome this lack of confidence and fear.<br />
<br />
There is actually a really good step by step still frame photo series on the web about how to take apart a Thinkpad X201 and replace the fan. I used that extensively during this task:<br /><br /><a href="http://www.myfixguide.com/manual/lenovo-thinkpad-x201-disassembly-clean-cooling-fan-remove-keyboard/">http://www.myfixguide.com/manual/lenovo-thinkpad-x201-disassembly-clean-cooling-fan-remove-keyboard/</a><br />
<br />
A walkthrough of my experience and a few notes:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1_I1f6zqbQ98hlPU1ISijfHBYLkUMBThbd-umF5Vf1GkhA6eNtNjHSyg_43kWDAN-7DJ8Jt2MsH4nA-TV0Fb2VOCB-4R_Uvso36SXv-ubA3efeVd2CEcGQYVyIUZbWHRr8CVWWre19PY/s1600/dexterscrews.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="92" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1_I1f6zqbQ98hlPU1ISijfHBYLkUMBThbd-umF5Vf1GkhA6eNtNjHSyg_43kWDAN-7DJ8Jt2MsH4nA-TV0Fb2VOCB-4R_Uvso36SXv-ubA3efeVd2CEcGQYVyIUZbWHRr8CVWWre19PY/s320/dexterscrews.jpg" width="320" /></a></div>
<br />
<br />
<br />
Mr. Dexter's Star Wars joke about <a href="http://en.wikipedia.org/wiki/Henry_F._Phillips" target="_blank">philips head</a> screws (actually bolts) notwithstanding, stripping those little guys is a problem. I was lucky this time. In my model train adventures, I've been less so.<br /><br />There are few things more annoying than a deep seated, little phillips head bolt or screw. The thought of taking a power drill to a laptop to extract one of these makes me a bit nervous. Fortunately, just before Radioshack went bankrupt a few months back, I found a nice set of long shaft phillips head screwdrivers in Tucson in one of their stores. Those tools have been indispensible.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyskFJSxGpaZb2y0AbFu5wNaVGdYFYTWu2Oy7JKgIY3a7R8ZS2Rlflhw6y1XhQXE8qwpvChOoLm7LEaGIfy6CPDTx2gTvOH_gvvRmVU2WDzrUhxu9BY1Ko7LsjYJGMERt9zibbyFA0d1Y/s1600/testingfan2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyskFJSxGpaZb2y0AbFu5wNaVGdYFYTWu2Oy7JKgIY3a7R8ZS2Rlflhw6y1XhQXE8qwpvChOoLm7LEaGIfy6CPDTx2gTvOH_gvvRmVU2WDzrUhxu9BY1Ko7LsjYJGMERt9zibbyFA0d1Y/s320/testingfan2.jpg" width="320" /></a></div>
<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg90GzmCMpW7fOJ5F4RKw8uXBfbt2TNPtlyDc00AwPAPYpev0SZbauOSIX12l76sxRqTdux99fW_cLmYtqGKeop2o9eo7kBTO35NmHnhpTLu071wjZSZ80YBkocMsJ2qd0CONsubWhvkFY/s1600/heatingup2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><br /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg90GzmCMpW7fOJ5F4RKw8uXBfbt2TNPtlyDc00AwPAPYpev0SZbauOSIX12l76sxRqTdux99fW_cLmYtqGKeop2o9eo7kBTO35NmHnhpTLu071wjZSZ80YBkocMsJ2qd0CONsubWhvkFY/s1600/heatingup2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><br /></a></div>
This is the laptop after I got the fan hooked up. There is a forum on the internet from a few years back where someone is asking how to test the fan. People just kept trolling him and laughing at him. Here is how it's done. Basically you hook everything up (in my case I only needed the power, screen, and keyboard) without actually putting the laptop back together (those zillion phillips head bolts!) and boot up. It's hard to see, but the fan is happily whirring away over there on the left.<br /><br />It's weird operating on a machine you're used to having in one piece - reminiscent of those <a href="http://en.memory-alpha.org/wiki/Time%27s_Arrow_%28episode%29" target="_blank">scenes in STTNG where they take apart LCDR Data</a> ("Data and <a href="http://en.memory-alpha.org/wiki/Commander" title="Commander">Commander</a> <a href="http://en.memory-alpha.org/wiki/William_T._Riker" title="William T. Riker">Riker</a> are in <a href="http://en.memory-alpha.org/wiki/Engineering" title="Engineering">engineering</a> examining Data's head.")<br /><br />Heating up:<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg90GzmCMpW7fOJ5F4RKw8uXBfbt2TNPtlyDc00AwPAPYpev0SZbauOSIX12l76sxRqTdux99fW_cLmYtqGKeop2o9eo7kBTO35NmHnhpTLu071wjZSZ80YBkocMsJ2qd0CONsubWhvkFY/s1600/heatingup2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg90GzmCMpW7fOJ5F4RKw8uXBfbt2TNPtlyDc00AwPAPYpev0SZbauOSIX12l76sxRqTdux99fW_cLmYtqGKeop2o9eo7kBTO35NmHnhpTLu071wjZSZ80YBkocMsJ2qd0CONsubWhvkFY/s320/heatingup2.jpg" width="320" /></a></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYKGpbh4fUTbM0XRCAsPERg4rrroMrGybjsRzgwTbfYYCXizh1q0uu4PND1KznnShrEjzfQ4qd4_qLxSEgym2igGylS9TqnVb4WvC6auJwgvRH0-vX6B6F4cxkzv_Itqtv6YA-Wrc6Oyg/s1600/missingscrewslots.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><br /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjswJCrAGsxBbSYPdhevP6yI__Tjp7cdV1qdwvunXe_O7n1P_j7xH530fGOHgzHvUdR7GxTW6JmIJdla1T2cwYzsaDCHeU-VAI4uSIUcpFXsCNfoTgqukJBbNzGZwAREWt4sQNCzhfCNTw/s1600/nickonkeyboard.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><br /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjswJCrAGsxBbSYPdhevP6yI__Tjp7cdV1qdwvunXe_O7n1P_j7xH530fGOHgzHvUdR7GxTW6JmIJdla1T2cwYzsaDCHeU-VAI4uSIUcpFXsCNfoTgqukJBbNzGZwAREWt4sQNCzhfCNTw/s1600/nickonkeyboard.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><br /></a></div>
You don't want to test the computer too long in this state (without the fan in the proper place and the machine put back together). Thinkpads and the X201 in particular are notorious for running hot. You can see from the screen that the machine is heating up at about a degree Centigrade in the time it takes me to type in the next sysctl command/query.<br /><br />sudo shutdown -hp now<br /><br />I put it all back together and the only (well, not really - see below) thing different was a slight nick on the keyboard:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjswJCrAGsxBbSYPdhevP6yI__Tjp7cdV1qdwvunXe_O7n1P_j7xH530fGOHgzHvUdR7GxTW6JmIJdla1T2cwYzsaDCHeU-VAI4uSIUcpFXsCNfoTgqukJBbNzGZwAREWt4sQNCzhfCNTw/s1600/nickonkeyboard.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjswJCrAGsxBbSYPdhevP6yI__Tjp7cdV1qdwvunXe_O7n1P_j7xH530fGOHgzHvUdR7GxTW6JmIJdla1T2cwYzsaDCHeU-VAI4uSIUcpFXsCNfoTgqukJBbNzGZwAREWt4sQNCzhfCNTw/s320/nickonkeyboard.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivoJOm743bCbeZC4hq8ymK80irvTUpaXjJasD-G-UNj3S-bdHFovCTngWoWQbLGmGYGeypnt4fVNiyIa_YHg1aOG5mAvOQh-vLfxq6qrFWhen_QcvthtF1yZSFC5USWZVxpxRG9-3At1Q/s1600/extrascrews.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><br /></a></div>
Hey! Where did these "extra" screws (bolts) come from?! Uh-oh . . .<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivoJOm743bCbeZC4hq8ymK80irvTUpaXjJasD-G-UNj3S-bdHFovCTngWoWQbLGmGYGeypnt4fVNiyIa_YHg1aOG5mAvOQh-vLfxq6qrFWhen_QcvthtF1yZSFC5USWZVxpxRG9-3At1Q/s1600/extrascrews.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivoJOm743bCbeZC4hq8ymK80irvTUpaXjJasD-G-UNj3S-bdHFovCTngWoWQbLGmGYGeypnt4fVNiyIa_YHg1aOG5mAvOQh-vLfxq6qrFWhen_QcvthtF1yZSFC5USWZVxpxRG9-3At1Q/s320/extrascrews.jpg" width="320" /></a></div>
. . . and the sound doesn't work either - looks like I was a bit too hasty in putting this thing back together. We'll give it another try . . .<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYKGpbh4fUTbM0XRCAsPERg4rrroMrGybjsRzgwTbfYYCXizh1q0uu4PND1KznnShrEjzfQ4qd4_qLxSEgym2igGylS9TqnVb4WvC6auJwgvRH0-vX6B6F4cxkzv_Itqtv6YA-Wrc6Oyg/s1600/missingscrewslots.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYKGpbh4fUTbM0XRCAsPERg4rrroMrGybjsRzgwTbfYYCXizh1q0uu4PND1KznnShrEjzfQ4qd4_qLxSEgym2igGylS9TqnVb4WvC6auJwgvRH0-vX6B6F4cxkzv_Itqtv6YA-Wrc6Oyg/s320/missingscrewslots.jpg" width="320" /></a></div>
. . . it looks like some of those extra bolts hold that important piece of aluminum in place . . .<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3QkjBQjyOdn42FYOPXcvFU7zBRpDhshDmvmP7C12ZYGhqUS64Zvdl3xVaF1OFFhW3lRVQ3g9PSunIoLjwEMGIyFOYeworZxKjcptdKhvCkeWwSUuXaDGfkfojdN8X7fN9o3_PEoAV9U0/s1600/missingscrewslots2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3QkjBQjyOdn42FYOPXcvFU7zBRpDhshDmvmP7C12ZYGhqUS64Zvdl3xVaF1OFFhW3lRVQ3g9PSunIoLjwEMGIyFOYeworZxKjcptdKhvCkeWwSUuXaDGfkfojdN8X7fN9o3_PEoAV9U0/s320/missingscrewslots2.jpg" width="320" /></a></div>
<br />
<br />
<br />
. . . and a few more over here . . .<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdqFGnKjlPLP8wF6G9PBtmWqZckCn5c26BM4ob7AdglM92_VzTtK5-fCm1FKsbuLVGUCk455ehxp60UBEIKeVRXQopVXamFDOSru-L88w6f2BoXwply8hPJEnspJzkKrNjNKi5Qo5q5b8/s1600/soundcarddisconnected.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdqFGnKjlPLP8wF6G9PBtmWqZckCn5c26BM4ob7AdglM92_VzTtK5-fCm1FKsbuLVGUCk455ehxp60UBEIKeVRXQopVXamFDOSru-L88w6f2BoXwply8hPJEnspJzkKrNjNKi5Qo5q5b8/s320/soundcarddisconnected.jpg" width="320" /></a></div>
. . . uh, OK, that's my problem with the sound :-(<br /><br />That sound card connection is paper thin - I think that's why it has to be secured with a little snap-in thingy in the picture. Hardware is pretty amazing sometimes. To all my electrical engineering friends: I bow to you.<br />
<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjowso48y3qrcjg9PLF3J6NdxTxM1ORaWoaMHvlGRg5TKwSAZcIJC4of85nQViHFU9W1fnfUZ8muambkoHkv0Iu8N2PvswKsYGmo8PpkrynE9NWA2tM1VgefVVzFPiLt2txBKe1ZsCL3Os/s1600/playinglongwaytothetop.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjowso48y3qrcjg9PLF3J6NdxTxM1ORaWoaMHvlGRg5TKwSAZcIJC4of85nQViHFU9W1fnfUZ8muambkoHkv0Iu8N2PvswKsYGmo8PpkrynE9NWA2tM1VgefVVzFPiLt2txBKe1ZsCL3Os/s320/playinglongwaytothetop.jpg" width="320" /></a></div>
<br />
This time I test the laptop again, but for sound. Doing what computer techs do (taking apart laptops and fixing them)<br />
<br />
<i>I tell you folks<br />
It's harder than it looks</i><br />
<br />
(Sorry, had to).<br /><br />After I got all the screws (bolts) put back in (save 4 - I have no idea and I'm leaving good enough alone), I still (thought) I had a problem with the sound. It turns out I've been through this before. On UNIX-based systems the X201 mute key works funkily. This link explains it a bit for a Debian Linux system. Whether OpenBSD is different under the hood or not, the behavior to the user is essentially the same:<br />
<br /><a href="http://www.stderr.nl/Blog/Hardware/Thinkpad/WeirdMuteButtonBehaviour.html" target="_blank">http://www.stderr.nl/Blog/Hardware/Thinkpad/WeirdMuteButtonBehaviour.html</a><br />
<br />
And I was good to go!<br /><br />Hope this helps someone like me.<br /><br />Thanks for stopping by.<br />
<br />
<br /><br /><br /><br /><br /><br /><br />
Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com4tag:blogger.com,1999:blog-524230429673765509.post-28618637849974819512015-03-26T09:41:00.002-07:002015-03-26T09:41:54.392-07:00IE and Getting a Text File Off the Web - Selenium Web ToolsI've blogged <a href="http://pyright.blogspot.com/2014/08/internet-explorer-9-save-dialog.html" target="_blank">previously</a> about getting information off of a distant server on my employer's internal SharePoint site. Automating this can be a little challenging, especially when there's a change.<br /><br />My new desktop showed up with Internet Explorer 11 and Windows 7 Enterprise. When I went to run my <a href="http://www.minesight.com/" target="_blank">MineSight</a> multirun (basically a batch file with a GUI front end that our mine planning vendor provides) the file fetch from our SharePoint site didn't work. A little googling led me to Selenium.<br /><br />As is often the case, I am wayyyy late to the party here. I remember Selenium from Pycon 2010 in Atlanta because they gave us a nice mug with new string formatting on it that I use frequently (both the mug and the formatting):<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgP6pdMKHSgg__iiU7v5dr7HbSRnMTdE2NqkBBtpWiLUWoeCTjZ_CqmwA1Z_qbRG0-EZ9dp2qrXmqkYtz0w20zt5jFWQRzzX-fBSid-aDC29lkMGbtcvpgGYqgvibqvtgUnxnnBXjLfIuw/s1600/saucelabsmug.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgP6pdMKHSgg__iiU7v5dr7HbSRnMTdE2NqkBBtpWiLUWoeCTjZ_CqmwA1Z_qbRG0-EZ9dp2qrXmqkYtz0w20zt5jFWQRzzX-fBSid-aDC29lkMGbtcvpgGYqgvibqvtgUnxnnBXjLfIuw/s1600/saucelabsmug.JPG" height="240" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
I was at Pycon 2010 . . . and I have the mug to prove it.</div>
<div class="separator" style="clear: both; text-align: center;">
<br /><br /> </div>
<div class="separator" style="clear: both; text-align: left;">
My project manager/boss at the time, Eric, seeing me gush over the string formatting commands, did his usual button-pushing exercise by commenting, "I don't know; why didn't they put something on there like 'from pot import coffee'?" People, y'know?<br /><br />Back to Selenium - I was able to get what I needed from it with some research and downloading. The steps are basically:</div>
<div class="separator" style="clear: both; text-align: left;">
<br /><br /> </div>
<div class="separator" style="clear: both; text-align: left;">
1) Download <a href="http://selenium-release.storage.googleapis.com/index.html" target="_blank">IEDriverServer.exe</a></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
2) Put the executable in a location in your path.</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
3) <a href="https://selenium-python.readthedocs.org/installation.html#downloading-python-bindings-for-selenium" target="_blank">Download Python Selenium Bindings and follow the install instructions</a>. I went the Python 3.4 route (versus the Python 2.7 that comes with MineSight) - personal preference on my part.<br /><br /> 4) Make sure your Internet Explorer environment/application is set up in a way that won't cause you problems. I could try to describe this, but this blog post from a Selenium developer does it so much better (complete with screenshots): <a href="http://jimevansmusic.blogspot.com/2012/08/youre-doing-it-wrong-protected-mode-and.html">http://jimevansmusic.blogspot.com/2012/08/youre-doing-it-wrong-protected-mode-and.html</a>. When Microsoft talks about "zones" and IE Protected Mode, the zones refer to things like "Trusted Sites," company web, external internet, etc. - all those have to be set to protected mode or things won't work and you'll get a fairly cryptic error message when the script crashes.</div>
<div class="separator" style="clear: both; text-align: left;">
<br /><br />For my example, I was able to comment out some of the things I need to do within the MineSight multirun. The DOS window hangs and IEDriverServer stays open within the MineSight multirun and app - I hacked this problem by killing it with an os.system() call. Whatever it takes.</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
I couldn't efficiently get the script to recognize HTML tag names, so I hacked that with text processing. This is bad, but effective.</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
The code:</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;">#!C:\Python34\python</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: Courier New; font-size: large;"></span> </div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;">"""<br />Get text from site via Internet Explorer.<br />"""</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: Courier New; font-size: large;"></span> </div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;">INST = 'instructions.txt'</span></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;"># For killing process inside Multirun.<br /># import os</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: Courier New; font-size: large;"></span> </div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;">from time import sleep as slpx</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: Courier New; font-size: large;"></span> </div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;">from selenium import webdriver</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: Courier New; font-size: large;"></span> </div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;"># XXX - hack - had difficulty getting<br /># things by tag - text processed it.<br />PRETAG = '<pre>'<br />PRETAGLEN = len(PRETAG)</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;"><br />PRETAGCLOSE = '</pre>'</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;"># Seconds to pause at end.<br />PAUSE = 3</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;">INSTRUCTIONS = 'http://ftp3.usa.openbsd.org/pub/OpenBSD/5.6/README'<br />INSTR = 'instructions.txt'</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: Courier New; font-size: large;"></span> </div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;"># XXX - may not matter (\r versus \n), in all cases<br /># but for numbers in multirun, makeshift chomp<br /># processing made a difference.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;"><br />RETCHAR = '\r'</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: Courier New; font-size: large;"></span> </div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: Courier New; font-size: large;"># Hack to shutdown DOS window.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;"># TASKKILL = 'taskkill /im IEDriverServer.exe /F'</span></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;">def getbody(url):<br /> """<br /> Given the website address (url),<br /> returns inner HTML text stripped of tags.<br /> """<br /> browser = webdriver.Ie()<br /> browser.get(url)<br /> text = browser.page_source<br /> browser.close()<br /> text = text[(text.index(PRETAG) + PRETAGLEN):]<br /> text = text[:(text.index(PRETAGCLOSE))]<br /> text = text.split(RETCHAR)<br /> [x.strip() for x in text]<br /> return text</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: Courier New; font-size: large;"></span> </div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;">textii = getbody(INSTRUCTIONS)<br />print('\nDealing with writing of instructions file . . .\n')<br />textii = ''.join(textii)<br />f = open(INSTR, 'w')<br />f.write(textii)<br />f.close()</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;">print('Instructions copied.')</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;">print('\nPausing {:d} seconds . . .\n'.format(PAUSE))<br />slpx(PAUSE)</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="color: blue; font-family: "Courier New", Courier, monospace; font-size: large;"><br /># XXX - can't get window to close in Multirun (MXPERT) - CBT 23MAR2015<br /># os.system(TASKKILL)</span></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com0tag:blogger.com,1999:blog-524230429673765509.post-36525774541616149672014-11-25T21:06:00.000-08:002015-03-26T08:38:57.063-07:00Polygon Offset Using Vector Math in IronPython<br />
The other day I saw a something retweeted by @leppie (I think) about an experimental hyper-fast vector math driven 3D engine for the dot Net Framework. This led me to investigate whether there is a default implementation of vector math in the dot Net Framework. As it turns out, there is.<br />
<br />
This is of interest because (I think) this would make IronPython the only Python implementation that has vector math included without having to install a third party library. Java has a utils.Vector object, but it has nothing to do with vector math (it's a specialized array). You do need to use the dot Net Framework instead of standard Python modules, but if you're running IronPython, you should have access to that anyway.<br />
<br />
The whole, or at least a big part of the idea of running a Python implementation against the dotNet Framework is that you can leverage the power of that big library collection with a language that's fairly dense, easy, and doesn't require compilation.<br />
<br />
This was pretty easy on Windows. The only confusing part is that there are two namespaces in dot Net called System.Windows. You want the one that references the WindowsBase dll. This is the one that has our Vector object in it.<br />
<br />
The code (including the plotting by Gnuplot - I had to download the Windows version; I did leave out the monastery.py file with the original shape points in it; also, the writetofile.py file is almost exactly like the one from the previous post except that for a Vector object, the x and y names are capitalized):<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"># vecipy.py</span><br />
<span style="font-family: "Courier New",Courier,monospace;"><br /></span>
<span style="font-family: "Courier New",Courier,monospace;">"""<br />Polygon offset problem using<br />dot Net Framework.<br />"""<br /><br />import clr<br /><br />WINX = 'WindowsBase'<br /><br />clr.AddReference(WINX)<br /><br />from System.Windows import Vector<br /><br />import math<br />import copy<br /><br />import monastery as pic<br /><br />OFFSET = 0.15<br /><br />def scaleadd(origin, offset, vectorx):<br /> """<br /> From a Vector representing the origin,<br /> a scalar offset, and a Vector, returns<br /> a Vector object representing a point <br /> offset from the origin.<br /><br /> (Multiply vectorx by offset and add to origin.)<br /> """<br /> # Multiply method that takes scalar and Vector.<br /> multx = Vector.Multiply(vectorx, offset)<br /> return Vector.Add(multx, origin)<br /><br />def getinsetpoint(pt1, pt2, pt3):<br /> """<br /> Given three points that form a corner (pt1, pt2, pt3),<br /> returns a point offset distance OFFSET to the right<br /> of the path formed by pt1-pt2-pt3.<br /> <br /> pt1, pt2, and pt3 are two tuples.<br /> <br /> Returns a Vector object.<br /> """<br /> origin = Vector(*pt2)<br /> v1 = Vector(pt1[0] - pt2[0], pt1[1] - pt2[1])<br /> v1.Normalize()<br /> <br /> v2 = Vector(pt3[0] - pt2[0], pt3[1] - pt2[1])<br /> v2.Normalize()<br /> <br /> v3 = copy.copy(v1)<br /><br /> v1 = Vector.CrossProduct(v1, v2)<br /><br /> v3 = Vector.Add(v3, v2)<br /> v3.Normalize()<br /> <br /> # In dotNet - Vector.Multiply is overloaded.<br /> # When it gets two Vector objects as arguments<br /> # it returns a dot product.<br /> cs = Vector.Multiply(v3, v2)<br /> <br /> # Again multiplication is overloaded.<br /> # Here it gets a scalar and a Vector<br /> # as arguments.<br /> a1 = Vector.Multiply(cs, v2)<br /> a2 = Vector.Subtraction(v3, a1)<br /> <br /> if cs > 0:<br /> alpha = math.sqrt(a2.LengthSquared)<br /> else:<br /> alpha =- math.sqrt(a2.LengthSquared)<br /> <br /> if v1 < 0.0:<br /> return scaleadd(origin, -1.0 * OFFSET/alpha, v3)<br /> else:<br /> return scaleadd(origin, OFFSET/alpha, v3)<br /><br />def generatepoints():<br /> """<br /> Create list of offset points<br /> for points inset from polygon.<br /><br /> Return list.<br /> """<br /> polyinset = []<br /> lenpolygon = len(pic.MONASTERY)<br /> i = 0<br /> poly = pic.MONASTERY<br /> while i < lenpolygon - 2:<br /> polyinset.append(getinsetpoint(poly[i], <br /> poly[i + 1], poly[i + 2]))<br /> i += 1<br /> polyinset.append(getinsetpoint(poly[-2], <br /> poly[0], poly[1]))<br /> polyinset.append(getinsetpoint(poly[0], <br /> poly[1], poly[2]))<br /><br /> return polyinset</span><br />
<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"># writetofile.py</span><br />
<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">"""<br />Write vector points to file.<br /><br />Show in gnuplot.<br />"""<br /><br />import vecipy as vecx<br />import os<br /><br /># We're using gnuplot.<br /># It doesn't like commas, so<br /># we'll use whitespace (6).<br />FMT = '{0:30.28f} {1:30.28f}'<br />FILEX = 'points'<br />ORIGSHAPE = 'originalshape'<br /><br />PLOTCMD = 'set xrange[0.0:6.0]\n'<br />PLOTCMD += 'set yrange[0.0:6.0]\n'<br />PLOTCMD += 'plot "{0:s}" with lines lt rgb "red" lw 4, '<br />PLOTCMD += '"{1:s}" with lines lt rgb "blue" lw 4'<br />GNUPLOTFILE = 'plotfile'<br />GNUPLOT = 'gnuplot -p {:s}'.format(GNUPLOTFILE)<br /><br />pts = vecx.generatepoints()<br />f = open(FILEX, 'w')<br />i = 1<br />for ptx in pts:<br /> print('Printing point {0:d} . . .'.format(i))<br /> print >> f, FMT.format(ptx.X, ptx.Y)<br /> i += 1<br />f.close()<br /><br /># Plot original as well.<br />i = 0<br />f = open(ORIGSHAPE, 'w')<br />for ptx in vecx.pic.MONASTERY:<br /> print('Printing point {0:d} of original shape . . .'.format(i))<br /> print >> f, FMT.format(*ptx)<br /> i += 1<br />f.close()<br /><br />f = open(GNUPLOTFILE, 'w')<br />print >> f, PLOTCMD.format(ORIGSHAPE, FILEX)<br />f.close()<br />os.system(GNUPLOT)</span><br />
<span style="font-family: "Courier New",Courier,monospace;"></span><br />
<span style="font-family: "Courier New",Courier,monospace;"><br /></span>The result (shown in previous post):<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQjqbNEvHblcEGtdkptoXmGMeAgMJ2FUinTvVjo0lTFxiLlz7AbEy74uXFAkZ0_VeNCho-f_xVL68-21UQ6NFWARRZ9DHE9iW-YwA08MC1wBZjcX_otMxdslqJ6C4jj79Vehvv-ZJU3Sw/s1600/monastery.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQjqbNEvHblcEGtdkptoXmGMeAgMJ2FUinTvVjo0lTFxiLlz7AbEy74uXFAkZ0_VeNCho-f_xVL68-21UQ6NFWARRZ9DHE9iW-YwA08MC1wBZjcX_otMxdslqJ6C4jj79Vehvv-ZJU3Sw/s1600/monastery.png" height="232" width="320" /></a></div>
<br />
I run OpenBSD on my laptop at home. So I would be using mono in my cross-platform experiment. <br />
<br />
Microsoft just recently (Fall 2014) announced the open sourcing of the dotNet Framework and cross platform capability for it. The mono project <a href="http://tirania.org/blog/archive/2014/Nov-12.html" target="_blank">responded</a> very positively to this announcement. I would imagine this as being good news for IronPython too.<br />
<br />
OpenBSD has a package for mono. From there, I just needed to download the IronPython binaries and run mono against them, or so I thought . . .<br />
<br />
As it turns out, my script kept crashing on the overloaded Vector.Multiply method - NotImplementedError. I tried to research things, wasn't having any luck, and brute forced the problem by wrapping the method in a class in C# class I called vecx:<br />
<br />
<i>Note (26NOV2014): I hacked this C# module up a bit too quickly and didn't have performance or elegance in mind. If you declare those Multiply methods as static you can save yourself the trouble of instantiating a new instance of the class each time you want to call them. In fact, you can do the same thing with all the Vector methods you want to use (Add, CrossProduct, etc.). I was just too hurried and too lazy. CBT</i> <br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">using System;<br /><br />public class vecx<br />{<br /><br /> public System.Windows.Vector vectorx;<br /><br /> public vecx()<br /> {<br /> System.Windows.Vector vectorx = new System.Windows.Vector(0.0, 0.0);<br /> this.vectorx = vectorx;<br /> }<br /><br /> public vecx(double x, double y)<br /> {<br /> System.Windows.Vector vectorx = new System.Windows.Vector(x, y);<br /> this.vectorx = vectorx;<br /> }<br /><br /> public Double Multiply(System.Windows.Vector a, System.Windows.Vector b)<br /> {<br /> return System.Windows.Vector.Multiply(a, b);<br /> }<br /><br /> public System.Windows.Vector Multiply(Double a, System.Windows.Vector b)<br /> {<br /> return System.Windows.Vector.Multiply(a, b);<br /> }<br /><br /> public System.Windows.Vector Multiply(System.Windows.Vector a, Double b)<br /> {<br /> return System.Windows.Vector.Multiply(a, b);<br /> }<br /><br />}</span><br />
<br />
<br />
<br />
The command line (your paths will probably be different) text for compiling this under mono was:<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">$ mcs -r:/usr/local/lib/mono/4.5/WindowsBase.dll -target:library vecx.cs </span><br />
<br />
The code using this faux Vector class was a little bit different (and hackish):<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">"""<br />Polygon offset problem using<br />dot Net Framework.<br /><br />Modified for use with mono.<br />"""<br /><br />import clr<br /><br /># Hacked C# module.<br />VECX = '/home/carl/vectormath/IronPython/mono/vecx.dll'<br /><br />clr.AddReference(VECX)<br /><br />import vecx<br /><br />import math<br />import copy<br /><br />import monastery as pic<br /><br />OFFSET = 0.15<br /><br />def scaleadd(origin, offset, vectorx):<br /> """<br /> From a Vector representing the origin,<br /> a scalar offset, and a Vector, returns<br /> a Vector object representing a point <br /> offset from the origin.<br /><br /> (Multiply vectorx by offset and add to origin.)<br /> """<br /> # Generic vector for use of Vector type.<br /> vecgeneric = vecx().vectorx<br /><br /> # Multiply method that takes scalar and Vector.<br /> # Using cs module compiled to dll for Multiply<br /> # methods in mono.<br /> multx = vecx().Multiply(vectorx, offset)<br /> return vecgeneric.Add(multx, origin)<br /><br />def getinsetpoint(pt1, pt2, pt3):<br /> """<br /> Given three points that form a corner (pt1, pt2, pt3),<br /> returns a point offset distance OFFSET to the right<br /> of the path formed by pt1-pt2-pt3.<br /> <br /> pt1, pt2, and pt3 are two tuples.<br /> <br /> Returns a Vector object.<br /> """<br /> # Generic vector for use of type.<br /> vecgeneric = vecx().vectorx<br /><br /> origin = vecx(*pt2).vectorx<br /> v1 = vecx(pt1[0] - pt2[0], pt1[1] - pt2[1]).vectorx<br /> v1.Normalize()<br /> <br /> v2 = vecx(pt3[0] - pt2[0], pt3[1] - pt2[1]).vectorx<br /> v2.Normalize()<br /> <br /> v3 = copy.copy(v1)<br /><br /> v1 = vecgeneric.CrossProduct(v1, v2)<br /><br /> v3 = vecgeneric.Add(v3, v2)<br /> v3.Normalize()<br /> <br /> # In dotNet - Vector.Multiply is overloaded.<br /> # When it gets two Vector objects as arguments<br /> # it returns a dot product.<br /> # Using cs module compiled to dll for Multiply<br /> # methods in mono.<br /> cs = vecx().Multiply(v3, v2)<br /> <br /> # Again multiplication is overloaded.<br /> # Here it gets a scalar and a Vector<br /> # as arguments.<br /> # Using cs module compiled to dll for Multiply<br /> # methods in mono.<br /> a1 = vecx().Multiply(cs, v2)<br /> a2 = vecgeneric.Subtract(v3, a1)<br /> <br /> if cs > 0:<br /> alpha = math.sqrt(a2.LengthSquared)<br /> else:<br /> alpha =- math.sqrt(a2.LengthSquared)<br /> <br /> if v1 < 0.0:<br /> return scaleadd(origin, -1.0 * OFFSET/alpha, v3)<br /> else:<br /> return scaleadd(origin, OFFSET/alpha, v3)<br /><br />def generatepoints():<br /> """<br /> Create list of offset points<br /> for points inset from polygon.<br /><br /> Return list.<br /> """<br /> polyinset = []<br /> lenpolygon = len(pic.MONASTERY)<br /> i = 0<br /> poly = pic.MONASTERY<br /> while i < lenpolygon - 2:<br /> polyinset.append(getinsetpoint(poly[i], <br /> poly[i + 1], poly[i + 2]))<br /> i += 1<br /> polyinset.append(getinsetpoint(poly[-2], <br /> poly[0], poly[1]))<br /> polyinset.append(getinsetpoint(poly[0], <br /> poly[1], poly[2]))<br /><br /> return polyinset</span><br />
<br />
Any port in a storm or whatever it takes, as they say.<br />
<br />
Thanks again to Mr. Rafsanjani whom I referenced in my previous post. His methodology and detection of a former bug got me back on track.<br />
<br />
And thank you for stopping by.<br />
<br />
<br />Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com0tag:blogger.com,1999:blog-524230429673765509.post-31450406997991332952014-11-16T00:09:00.001-08:002014-11-16T00:10:44.392-08:00Polygon Offset With pyeuclid RevisitedA few years back I did two or three posts on polygon offset. It was a
learning experience that I never quite completed to my satisfaction. A
kind visitor to my <a href="http://pyright.blogspot.com/2011/07/pyeuclid-vector-math-and-polygon-offset.html" target="_blank">last post</a> on the subject, Mr. <cite class="user"><a href="http://www.blogger.com/profile/10477183604973258516" rel="nofollow">Ahmad Rafsanjani</a></cite>,
actually rewrote some of my code in a comment. I gave him a polite
weasel answer thanking him, but dropped the effort and never felt quite
right about it.<br />
<br />
Well, as the saying goes, better late than never. He was quite correct in his assessment, but my understanding of vector math was not strong enough to prove this to myself. I was <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih0EiMu9MWu8rFbWUqmGvciA28BgS21gZGvO190cuxlQSEbgmYIzyssiAX_5Y-wwOYRtrc5Qm9NkNTEv59LQSwPXIgznC_2TBIxqOjPUCu5nWdxV3H5YVxd0UXxS3iYYpfy0_DHVtvciU/s1600/monasteryeggsmooth.png" target="_blank">visually inspecting the results</a>, and, given what I was dealing with at the time, they seemed OK. <br />
<br />
Here is the picture we're trying to get (this is with Mr. Rafsanjani's code, but the difference with mine and the original code, although wrong, is not great):<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgBIYcwPaZnyfcgaggbMOlNst73mMpSQ060pXK_18-AmSlOWmwCB24ZvlHudAFVFuS7SECNY0tBp_fEQgNRTw4luhg3fgjg8Zrs0aCC_jEp_mDqcHYzMbMwcsB2rmCEKRtKPBterQCWG90/s1600/monastery.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgBIYcwPaZnyfcgaggbMOlNst73mMpSQ060pXK_18-AmSlOWmwCB24ZvlHudAFVFuS7SECNY0tBp_fEQgNRTw4luhg3fgjg8Zrs0aCC_jEp_mDqcHYzMbMwcsB2rmCEKRtKPBterQCWG90/s1600/monastery.png" height="232" width="320" /></a></div>
<br />
<br />
<br />
<br />
<br />
In order to nail down the discrepancy in my original code, I inserted some print statements with a lot of numeric precision (28 digits to the right of the decimal) in the output: <br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">$ more points<br />1.2231671842700024832595318003 1.7024195134850139687898717966<br />2.1231671842700023944416898303 1.7024195134850139687898717966<br />2.2768328157299975167404681997 2.54<span style="background-color: yellow;">75804865149860312101282034</span><br />1.6635803619063778135966913396 2.54<span style="background-color: yellow;">93839809701555054743948858</span><br />1.7364196380936223196300716154 3.35<span style="background-color: yellow;">06160190298444057077631442</span><br />2.5205825797292722434406186949 3.35<span style="background-color: yellow;">29128986463621053815131745</span><br />2.6794174202707274901058553951 4.1<span style="background-color: yellow;">470871013536383387076966756</span><br />2.1360193516544989655869812850 4.1<span style="background-color: yellow;">228847880778562995374159073</span></span><br />
<span style="font-family: "Courier New",Courier,monospace;"><br /></span>
<span style="font-family: "Courier New",Courier,monospace;">(etc.)</span><br />
<span style="font-family: "Courier New",Courier,monospace;"><br /></span>
<span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;">The numbers highlighted in yellow are mismatches in the Y-coordinates of points of the inset offset polygon - each pair of Y coordinates should represent lines parallel to the X axis; in other words, they should be equal. I have a bug.</span></span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;">Contrast that with the numbers yielded by Mr. Rafsanjani's code:</span></span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">$ more points<br />1.2251864530113494300422871675 1.7000000000000001776356839400<br />2.1251864530113491191798402724 1.7000000000000001776356839400<br />2.2797319075568038826418160170 2.5499999999999998223643160600<br />1.6642549229616445671808833140 2.5499999999999998223643160600<br />1.7369821956889173186766583967 3.3500000000000000888178419700<br />2.5229705854077835169846366625 3.3500000000000000888178419700<br />2.6829705854077836590931838145 4.1500000000000003552713678801<br />2.1880983342360056376207921858 4.1500000000000003552713678801<br />2.6780983342360054066944030637 4.8499999999999996447286321199<br />3.1219016657639944156699129962 4.8499999999999996447286321199</span><br />
<span style="font-family: "Courier New",Courier,monospace;">(etc.)</span><br />
<span style="font-family: "Courier New",Courier,monospace;"><br /></span>
<span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;">Much better. Lines that are supposed to be perfectly parallel to the X axis are, at least to 28 decimal places precision and the limits of my platform and the C Python interpreter, parallel to the X axis. For what I am doing, I can more than live with that.</span></span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;">I've included Mr. Rafsanjani's comments in the code. My modifications to his code were mainly for the purpose of printing some things out and organizing the polygon offset part of this exercise into a module.<br /><br />I've made a separate main script for gnuplot. After not looking at everything for three years I realized I had forgotten everything I ever knew about gnuplot and wanted to record it this time. The file with the 20 points for the shape (monastery.py) is available on request.</span></span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;">Here is the main pyeuclid/polygon offset part of the code (rafsanjanicorrection.py):</span></span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">"""<br />Polygon offset problem using<br />pyeuclid and incorporating corrections<br />made by Ahmed Rafsanjani.<br />"""<br /><br /># Mr. Rafsanjani's comments:<br /><br /># I think there is a small bug:<br /><br /># In "getinsetpoint", the vector v3 should be<br /># normalized before passing to "scaleadd".<br /><br /># Furthermore, the final offset is not as the<br /># prescribed OFFSET and the angle between <br /># vectors should be taken into account.<br /><br /># A possible solution could be:<br /><br />import euclid as eu<br />import math<br />import copy<br /><br />import monastery as pic<br /><br />OFFSET = 0.15<br /><br />def scaleadd(origin, offset, vectorx):<br /> """<br /> From a vector representing the origin,<br /> a scalar offset, and a vector, returns<br /> a Vector3 object representing a point <br /> offset from the origin.<br /><br /> (Multiply vectorx by offset and add to origin.)<br /> """<br /> multx = vectorx * offset<br /> return multx + origin<br /><br />def getinsetpoint(pt1, pt2, pt3):<br /> """<br /> Given three points that form a corner (pt1, pt2, pt3),<br /> returns a point offset distance OFFSET to the right<br /> of the path formed by pt1-pt2-pt3.<br /> <br /> pt1, pt2, and pt3 are two tuples.<br /> <br /> Returns a Vector3 object.<br /> """<br /> origin = eu.Vector3(pt2[0], pt2[1], 0.0)<br /> v1 = eu.Vector3(pt1[0] - pt2[0], pt1[1] - pt2[1], 0.0)<br /> v1.normalize()<br /> <br /> v2 = eu.Vector3(pt3[0] - pt2[0], pt3[1] - pt2[1], 0.0)<br /> v2.normalize()<br /> <br /> v3 = copy.copy(v1)<br /> v1 = v1.cross(v2)<br /> v3 += v2<br /> v3.normalize()<br /> <br /> cs = v3.dot(v2)<br /> <br /> a1 = cs * v2<br /> a2 = v3 - a1<br /> <br /> if cs > 0:<br /> alpha = math.sqrt(a2.magnitude_squared())<br /> else:<br /> alpha =- math.sqrt(a2.magnitude_squared())<br /> <br /> if v1.z < 0.0:<br /> return scaleadd(origin, -1.0 * OFFSET/alpha, v3)<br /> else:<br /> return scaleadd(origin, OFFSET/alpha, v3)<br /><br />def generatepoints():<br /> """<br /> Create list of offset points<br /> (pyeuclid.Vector3 objects) for<br /> points inset from polygon.<br /><br /> Return list.<br /> """<br /> polyinset = []<br /> lenpolygon = len(pic.MONASTERY)<br /> i = 0<br /> poly = pic.MONASTERY<br /> while i < lenpolygon - 2:<br /> polyinset.append(getinsetpoint(poly[i], <br /> poly[i + 1], poly[i + 2]))<br /> i += 1<br /> polyinset.append(getinsetpoint(poly[-2], <br /> poly[0], poly[1]))<br /> polyinset.append(getinsetpoint(poly[0], <br /> poly[1], poly[2]))<br /><br /> return polyinset</span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;">The file that prints stuff out and summons gnuplot (writtofile.py):</span></span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;"><span style="font-family: "Courier New",Courier,monospace;">"""<br />Write vector points to file.<br /><br />Show in gnuplot.<br />"""<br /><br /># import blogpost as vecx<br />import rafsanjanicorrection as vecx<br />import os<br /><br /># We're using gnuplot.<br /># It doesn't like commas, so<br /># we'll use whitespace (6).<br />FMT = '{0:30.28f} {1:30.28f}'<br />FILEX = 'points'<br />ORIGSHAPE = 'originalshape'<br /><br />PLOTCMD = 'set xrange[0.0:6.0]\n'<br />PLOTCMD += 'set yrange[0.0:6.0]\n'<br />PLOTCMD += 'plot "{0:s}" with lines lt rgb "red" lw 4, '<br />PLOTCMD += '"{1:s}" with lines lt rgb "blue" lw 4'<br />GNUPLOTFILE = 'plotfile'<br />GNUPLOT = 'gnuplot -p {:s}'.format(GNUPLOTFILE)<br /><br />pts = vecx.generatepoints()<br />f = open(FILEX, 'w')<br />i = 1<br />for ptx in pts:<br /> print('Printing point {0:d} . . .'.format(i))<br /> print >> f, FMT.format(ptx.x, ptx.y)<br /> i += 1<br />f.close()<br /># Plot original as well.<br /># XXX - repetetive - make function.<br />i = 0<br />f = open(ORIGSHAPE, 'w')<br />for ptx in vecx.pic.MONASTERY:<br /> print('Printing point {0:d} of original shape . . .'.format(i))<br /> print >> f, FMT.format(ptx[0], ptx[1])<br /> i += 1<br />f.close()<br /><br />f = open(GNUPLOTFILE, 'w')<br />print >> f, PLOTCMD.format(ORIGSHAPE, FILEX)<br />f.close()<br />os.system(GNUPLOT)</span> </span></span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;">pyeuclid, to the best of my knowledge, runs only in Python 2.7 at the moment. In any case, I got an error on the Python 3.4 install with setup.py so I stuck with 2.7.</span></span><br />
<span style="font-family: "Courier New",Courier,monospace;"><br /></span>
<span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;">Thanks to Mr. Rafsanjani for his help with this and for the rest of you for stopping by.</span></span><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com2tag:blogger.com,1999:blog-524230429673765509.post-53708330578915978282014-11-03T10:56:00.000-08:002014-11-04T15:47:08.176-08:00MeetBSD California 2014 Recap<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj33tioCM07Z2B29lsbNpVVbW2gknSbY8_TzQZskI-ziPpkvyaz2Jy8QHvTYKU2v6-pA7QMOnobAefcEsFuq6PQHBdHfhEQZOxCdDaOOFRF4b1NmRl3aht8K2l0SRHUPRCvlexbggZKAiE/s1600/bsdsurfertee.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj33tioCM07Z2B29lsbNpVVbW2gknSbY8_TzQZskI-ziPpkvyaz2Jy8QHvTYKU2v6-pA7QMOnobAefcEsFuq6PQHBdHfhEQZOxCdDaOOFRF4b1NmRl3aht8K2l0SRHUPRCvlexbggZKAiE/s1600/bsdsurfertee.jpg" height="240" width="320" /></a></div>
<br />
<br />
I am returning from MeetBSD in San Jose, California. This isn't a Python-related post per se, but the BSD family of operating systems maintains packages and ports for Python and Python third party libraries, and use of Python on these systems is significant both in the open source development and commercial spheres.<br />
<br />
The structure of the conference is a brief weekend unconference. Nonetheless some of the talks were more than worthy of a full fledged mega-con, and the rest were quality. It was a good deal.<br />
<br />
Venue: the conference was held at Western Digital. WD sells a variety of hardware. The product they were pushing was a <a href="http://www.wdc.com/en/products/products.aspx?id=1140" target="_blank">several terabyte little box that updates wirelessly (but not by Bluetooth)</a>. <br />
<br />
We met in a rectangular conference room. All of Silicon Valley seems to me to be an endless office park with nice weather and some landscaped spots (I've included the obligatory Strelizia/bird of paradise pic from the conference hotel entrance below). It was a fairly intimate setting. The food (a variety of sandwiches) was good. We were warned ahead of time that Wifi was limited; I brought my own Verizon jetpack unit so it wasn't an issue for me.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0_xDa5WA-CJkAfsM_Vvne7CTigigrh34QOIwYAQ3YghRv7SfnCw0NOXAYeApyv9m1BBCVA6aYTYHAB45l3L3x9tYZAsmw8eg7sok75IIpWnRmIanEFEpRMAdH-pNREeUl2mEZ2J84y00/s1600/strelizia.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0_xDa5WA-CJkAfsM_Vvne7CTigigrh34QOIwYAQ3YghRv7SfnCw0NOXAYeApyv9m1BBCVA6aYTYHAB45l3L3x9tYZAsmw8eg7sok75IIpWnRmIanEFEpRMAdH-pNREeUl2mEZ2J84y00/s1600/strelizia.jpg" height="240" width="320" /></a></div>
<br />
Talks (that I attended): <br />
<br />
1) Rick Reed, “WhatsApp: Half a billion unsuspecting FreeBSD users” - Erlang and FreeBSD at WhatsApp used for scaling. Now 600,000 users. It was a good talk, but I wasn't awake and some of it went over my head.<br />
<br />
2) Jordan Hubbard, “FreeBSD: The Next 10 Years” Good talk; I hated it :-(<br />
<br />
Hubbard's leaving Apple a couple years ago and signing on with iXSystems (a sponsor and essentially the organizer of this conference) made a big splash. He is an accomplished dev and a good guy by all accounts. His ideas are on many levels very valid in every sense.<br />
<br />
I am primarily an OpenBSD user. I run FreeBSD on my RPi and on a spare laptop for easy access to Java. The two OS's have similar philosophies in some respects (correctness, BSD license, etc.). There is cross-polination when it comes to operating system components, apps, and drivers. But where OpenBSD unapologetically maintains new releases for older hardware and uncompromisingly adheres to its leader's approach to security and development, FreeBSD in the framework of Hubbard's talk is looking more towards the future and making changes to attract younger talented core committers and target more modern (read mobile) platforms. Telemetry, scrapping development on older platforms "ruthlessly," getting younger devs involved by providing work that's interesting to them - all this stuff is important for FreeBSD going forward. At one point he even <gasp> suggested systemd as a good strategy for Linux that FreeBSD should, at least in principle if not in form, emulate.<br />
<br />
FreeBSD is everywhere - or at least in a lot of places companies just don't make a big deal of. Inside cable (connections) was the one example. In order to accomodate mobile and embedded environments, the OS, although well suited to these platforms now, needs to change.<br />
<br />
A lot of this in my mind goes against OpenBSD's philosophy - purity and security at all costs. My personal philosophy lies with the OpenBSD approach, but I may well be wrong. Hubbard is a guy with a lot of industry know how and experience and I am a geologist who uses OpenBSD. He is probably right, but I don't want my fun to stop, so I'm sticking with OpenBSD even if death awaits us . . .<br />
<br />
3) David Maxwell, "The Unix command pipeline - using Unix in the renewable energy era"<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggx7faqIgWfbnDNWua-BaB2eJ39BryuNoO2txYD6Dnbw7DPjEd-YM4coNQXSGW_wnH8JF6SxR9cH2yBp7_WIzsRYKd2FqlJRc338DfZ7sVwFb2UJ66QwnR-qzt04_bcrDvH6dZBVDB2zs/s1600/maxwellbadii.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggx7faqIgWfbnDNWua-BaB2eJ39BryuNoO2txYD6Dnbw7DPjEd-YM4coNQXSGW_wnH8JF6SxR9cH2yBp7_WIzsRYKd2FqlJRc338DfZ7sVwFb2UJ66QwnR-qzt04_bcrDvH6dZBVDB2zs/s1600/maxwellbadii.jpg" height="240" width="320" /></a></div>
<br />
I always liked Maxwell. He's a Canadian guy and a NetBSD devotee.<br />
<br />
His talk was about a command line app he's putting together for better tracking piped commands on the UNIX command line and reproducing, referencing, and inspecting them retroactively in a way that's easier than what you have to do now. I think it's got potential and would like to see it succeed.<br />
<br />
After the angst I felt over Hubbard's talk, this was a welcome relief. The UNIX command line is something everyone, or most everyone at the con knows and loves. Everyone uses piped commands. This is a useful approach to a common problem - that's something we can all agree on. My favorite talk of the conference (that I attended).<br />
<br />
<br />
4) Alex Rosenberg, "Meet PlayStation 4"<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQNAdptwlF5CUdDwg3eg6xOx6mzbL1gIfdCOxF-lRrDK_XRYqUf-xB7d34Vk5qsZMJkA-nfhmeQ6D0AjTo3jf_wj9hfayXeCwO7W2CSXpsWhAiolSd24WXuu1RNXAlEFGmptS_ImUfBDI/s1600/rosenberggood.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQNAdptwlF5CUdDwg3eg6xOx6mzbL1gIfdCOxF-lRrDK_XRYqUf-xB7d34Vk5qsZMJkA-nfhmeQ6D0AjTo3jf_wj9hfayXeCwO7W2CSXpsWhAiolSd24WXuu1RNXAlEFGmptS_ImUfBDI/s1600/rosenberggood.jpg" height="240" width="320" /></a></div>
<br />
<br />
<br />
<br />
By far and away the coolest talk. Rosenberg presented this well and spoke honestly and as openly as he could as a member of a big commercial project about specifics. Games require so much optimization at such a low level. Although this theme came up in a number of the talks, on the PlayStation project it's critical. Essentially, the best hardware and hardware architecture for the project is selected for a given product lifecycle (10 years? IIRC) then you hammer at it with software modifications to get every last bit of efficiency out of it.<br />
<br />
It's not like there's a standard laptop install of FreeBSD on PlayStation 4 and you let it rip with your happy traditional UNIX OS. They're optimizing LLVM and clang (the compiler and linkers), talking directly to the metal as much as possible, and just generally nailing performance at the lowest level of the architecture (after they've gotten the low hanging fruit up top, of course).<br />
<br />
Another theme that came up in almost all the talks, but especially in this one, was the BSD license. Granted, it was a BSD conference, so organizers and attendees have a bias. Nonetheless, it appears that licensing is really critical in the decision to adopt open source software and operating systems. "business friendly" nowadays often has "capitalism at its worst" overtones, still, it was a theme: the BSD license is the "business friendly" one whereas the GPL, particularly the GPL3, is not . . . <br />
<br />
I'm not a gamer, but I enjoyed this. Rosenberg is really easy to talk to as well. He let me take that pic up close when we were posing for the group pic after his talk.<br />
<br />
5) Brendan Gregg, "Performance Analysis"<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8FIyWfTNOLKvJVvFJSDaoUfVJLa4E99ctAWR8RbfpEe8Jtcpdv3aUGrDrhWZS6TL0-cZ5mn34GyqN6YzjdnCjVi2ZSrjZCfWRGP2LPpY-OAqVX5q_ovtPHv1WNVV5ifH1OqOzfh-L99U/s1600/greg.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8FIyWfTNOLKvJVvFJSDaoUfVJLa4E99ctAWR8RbfpEe8Jtcpdv3aUGrDrhWZS6TL0-cZ5mn34GyqN6YzjdnCjVi2ZSrjZCfWRGP2LPpY-OAqVX5q_ovtPHv1WNVV5ifH1OqOzfh-L99U/s1600/greg.jpg" height="240" width="320" /></a></div>
<br />
<br />
<br />
Gregg works for Netflix. He's written a lot of dtrace scripts (including numerous Python ones) and has them readily available on Github.<br />
<br />
I found myself wishing I knew more about the subject, because performance monitoring is a really cool netadmin problem when, like Netflix, you're dealing with huge bandwidth challenges (as in other talks, so much comes down to optimization).<br />
<br />
That said, Gregg presented some graphical tools that are useful (I'll get the names wrong, so I won't try) - basically histogram-like, color coded performance charts with labels for processes. You don't have to run your own netflix to benefit from these and he's made everything open source and available. If I were a netadmin I would jump on this. I've got to get smarter first before I can benefit from these tools.<br />
<br />
Gregg has a soft British accent and a very amiable demeanor. He was the first talk in the morning. It was like a lullabye. This is one I need to revisit on the videos posted online because it's worth it.<br />
<br />
<br />
6) Corey Vixie, "Web Apps on Embedded BSD..."<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrLmQi8-A0X2AWp8556OpBUDGHyHAfoIN7XjMgLz2fhyphenhyphenF2sboQrAGAWBXn6eaNmBhgDX-tvr89Wbhohgh1ScHs18yn17r1-RjX5u6Vf4BP-xR3eLgsaO0WOPSQZMU33NHaxQPem7lNS84/s1600/vixie.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrLmQi8-A0X2AWp8556OpBUDGHyHAfoIN7XjMgLz2fhyphenhyphenF2sboQrAGAWBXn6eaNmBhgDX-tvr89Wbhohgh1ScHs18yn17r1-RjX5u6Vf4BP-xR3eLgsaO0WOPSQZMU33NHaxQPem7lNS84/s1600/vixie.jpg" height="240" width="320" /></a></div>
<br />
<br />
<br />
<br />
The iXSystems surprise talk, but a good one. The youngster Vixie briefed us a bit on what iXSystems is doing with web presentation layer (for lack of a better description) of the FreeNAS implementation.<br />
<br />
He started off by saying static web pages are, at least for apps like FreeNAS, not the way to go anymore. Refreshing the DOM (Document Object Model) at regular intervals is not going to work well. He then introduced us to a number of mature and nascent JavaScript/web technologies, some of which no one in the room had yet heard of. Basically he had to rewrite the "old" Django/other technologies implementation to accomodate better simulation of a desktop app in the browser.<br />
<br />
The specifics were not something I could follow well because of my ignorance. There was talk of an Open Source, BSD licensed Facebook framework whose name I can't recall, a one-way change propagation architecture for updating the dynamic web page, and, as always, optimization of the process. I asked him about Django after the talk. He said it was the best thing a couple years ago for this app, but now they needed something that could interact directly with the browser - namely JavaScript - it comes down to fine-grained control and optimization.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="https://ytimg.googleusercontent.com/vi/0G-o2TuP_Kc/0.jpg" height="266" width="320"><param name="movie" value="https://youtube.googleapis.com/v/0G-o2TuP_Kc&source=uds" /><param name="bgcolor" value="#FFFFFF" /><param name="allowFullScreen" value="true" /><embed width="320" height="266" src="https://youtube.googleapis.com/v/0G-o2TuP_Kc&source=uds" type="application/x-shockwave-flash" allowfullscreen="true"></embed></object></div>
<br />
One humorous interlude during the Q & A was my asking him if he was indeed related to Paul Vixie, historical UNIX tools author (Vixie Cron), to which he replied, "This is the part of my talk where I say, 'I am Worf, son of Mogh.'" Anyone with a sense of humor and a knowledge of STTNG can't be all bad ;-)<br />
<br />
A few people pics:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQWYms0TNu-rNz-yvMpkHdwcT0O7_tO7sSEna4J4-XZPJ614-Jy3YDyhJJ6xIVnhT2MX7tC2MOsWNuNRVQw0pj0EKwnj9PzvVSSTistWcU_zrI_ZbYQWSsI-OWyj2GjPm-B-5cNE7mbvU/s1600/dru.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQWYms0TNu-rNz-yvMpkHdwcT0O7_tO7sSEna4J4-XZPJ614-Jy3YDyhJJ6xIVnhT2MX7tC2MOsWNuNRVQw0pj0EKwnj9PzvVSSTistWcU_zrI_ZbYQWSsI-OWyj2GjPm-B-5cNE7mbvU/s1600/dru.jpg" height="240" width="320" /></a></div>
<br />
<br />
<br />
<br />
Dru Lavigne. Without the BSDA cert program she helped found, I would never have gotten over the hump learning UNIX. We differ on our choice of specific BSD, but I still consider her my UNIX mentor.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibahl3vo7xnltQjJCoD65z8VnfcN6mY0V6drUeUeLl7fvut3aJ5CN8P7SrdcQBoCKT36wlnHMVUThM2CxGJM_NklPPh0WqHjuADENE-D5C9u3st7_hTB05t7E5BaGKI0wYWCGX6gq4gkE/s1600/deniseandmatt.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibahl3vo7xnltQjJCoD65z8VnfcN6mY0V6drUeUeLl7fvut3aJ5CN8P7SrdcQBoCKT36wlnHMVUThM2CxGJM_NklPPh0WqHjuADENE-D5C9u3st7_hTB05t7E5BaGKI0wYWCGX6gq4gkE/s1600/deniseandmatt.jpg" height="240" width="320" /></a></div>
<br />
<br />
<br />
<br />
iXSystems old timers Denise and Matt working out conference specifics.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEij8zIacYtz1rygNIYdUhW4cUUYoTyiVdgfn7bwSibHpawBsuBzvyAbeUEDu6zWQctrdXcpTFCe7OFgVeSYBZOCB0qRTqGiwcuLRl-jOQq4ivdWFxYG33dPxm7KiLkJBuD_reXPwnypF3Q/s1600/annefreebsdfoundation.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEij8zIacYtz1rygNIYdUhW4cUUYoTyiVdgfn7bwSibHpawBsuBzvyAbeUEDu6zWQctrdXcpTFCe7OFgVeSYBZOCB0qRTqGiwcuLRl-jOQq4ivdWFxYG33dPxm7KiLkJBuD_reXPwnypF3Q/s1600/annefreebsdfoundation.jpg" height="240" width="320" /></a></div>
<br />
<br />
<br />
<br />
FreeBSD Foundation rep Anne.<br />
<br />
Conclusion: MeetBSD is an affordable, pretty meaty con if you like UNIX, hardware, and topics about optimization and scale. It is, fortunately or unfortunately, a pretty well kept secret.<br />
<br />
Thanks for stopping by.Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com2tag:blogger.com,1999:blog-524230429673765509.post-16210702140347624232014-10-31T00:15:00.000-07:002014-10-31T00:15:18.108-07:00Gtk.TreeView (grid view) with mono, gtk-sharp, and IronPythonThe post immediately prior to this one was an attempt to reproduce Windows.Forms Calendar controls in Gtk for cross platform (Windows/*nix) effective rendering.<br />
<br />
This time I am attempting to get familiar with gtk-sharp/Gtk's version of a grid view - the Gtk.TreeView object. Some of the gtk-sharp documentation suggests the NodeView object would be easier to use. I had some trouble instantiating the objects associated with the NodeView and went with the TreeView instead in the hopes of getting more control.<br />
<br />
The Windows.Forms GridView I did years ago is <a href="http://www.ironpython.info/index.php?title=DataGridView_Custom_Formatting" target="_blank">here</a>. It became apparent to me shortly after embarking on this journey that I would be hard pressed to recreate all the functionality of that script in a timely manner. I settled for a tabular view of drillhole data (fabricated, mock data) with some custom formatting.<br />
<br />
Aside: this is typically how mineral exploration drillhole data (core, reverse circulation drilling) is presented in tabular format - a series of from-to intervals with assay values. Assuming the assays are all separate elements, the reported weight percents should not sum more than 100%, and never do unless someone fat fingers a decimal place. I've projected a couple screaming hot polymetallic drill holes that end near surface (lack of funding for drilling), but show enough promise that the new mining town of Trachteville (the drill hole name CBT-BNZA stands for CBT-Bonanza) will spring up there at any moment . . . one can dream.<br />
<br />
The data store object for the grid view Gtk.ListStore object would not instantiate in IronPython. I was not the only person to have experienced this problem (I cannot locate the link to the mailing list thread or forum reference, but like the big fish that got away, I swear I saw it). I didn't want to drop the effort just because of that, so I hacked and compiled some C# code:<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">public class storex<br />{<br /> public Gtk.ListStore drillhole = <br /> // 7 columns<br /> // drillhole id <br /> new Gtk.ListStore (typeof (string),<br /> // from<br /> typeof (double),<br /> // to<br /> typeof (double),<br /> // assay1<br /> typeof (double),<br /> // assay2<br /> typeof (double),<br /> // assay3<br /> typeof (double),<br /> // assay4<br /> typeof (double));<br />}</span><br />
<br />
<span style="font-family: inherit;">The mono command on Windows was</span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">C:\UserPrograms\Mono-3.2.3>mcs -pkg:gtk-sharp-2.0 /target:library C:\UserPrograms\IronPythonGUI\storex.cs </span><br />
<br />
<span style="font-family: inherit;">Those are my file paths; locations depend on where you install things like mono and IronPython.</span><br />
<br />
<span style="font-family: inherit;">Anyway, I got my dll and I was off to the races. Getting to know the Gtk and gtk-sharp object model proved challenging for me. I'm glad I got some familiarity with it, but it would take me longer to do something in Gtk than it did with Windows.Forms. The most fun and gratifying part of the project was getting the custom formatting to work with a Gtk.TreeCellDataFunc. I used a function that yielded specific functions for each column - something that's really easy to do in Python.</span><br />
<br />
<span style="font-family: inherit;">Anyway, here are a couple screenshots and the IronPython code:</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSL2CJehKs5j-Rs9DIPMPFMgoJE_Fqsdf53tJ-WnmX1JZiPxEqORj_mQUQkfFDZquBiecnZ21SYXqKmqw5suAKKiOIu1FuCQpql1ClZ0BPZgypBZgTDzqfhmoaKOEna46LenTPzoNBwus/s1600/windowsgridviewmono.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSL2CJehKs5j-Rs9DIPMPFMgoJE_Fqsdf53tJ-WnmX1JZiPxEqORj_mQUQkfFDZquBiecnZ21SYXqKmqw5suAKKiOIu1FuCQpql1ClZ0BPZgypBZgTDzqfhmoaKOEna46LenTPzoNBwus/s1600/windowsgridviewmono.PNG" height="96" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRmKE8MvIxWzytg834H6FDkNhe3vPel1nerX3k-qtmmWUq1KZL5IVENySHmggp9iNaCq-baCnHQihhIGmrbDlXvPemiYJBqiKg2Dh-Es-cUtRXex21pEPM6C1V4O_Yn8UXrYXKcUsXEEY/s1600/openbsddrillhole.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRmKE8MvIxWzytg834H6FDkNhe3vPel1nerX3k-qtmmWUq1KZL5IVENySHmggp9iNaCq-baCnHQihhIGmrbDlXvPemiYJBqiKg2Dh-Es-cUtRXex21pEPM6C1V4O_Yn8UXrYXKcUsXEEY/s1600/openbsddrillhole.png" height="115" width="320" /></a></div>
<br />
<br />
<br />
<br />
<span style="font-family: inherit;">The OpenBSD one below turned out pretty good, but the Windows one had a little double line underneath the first row - it looked as though it was still trying to select that row when I told it specifically not to. I'm not a design perfectionist Steve Jobs type, but niggling nits like that drive me batty. For now, though it's best I publish the code and move on.</span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">#!/usr/local/bin/mono /home/carl/IronPython-2.7.4/ipy64.exe<br /><br />import clr<br /><br />GTKSHARP = 'gtk-sharp'<br />PANGO = 'pango-sharp'<br /><br /># Mock store C#<br />STOREX = 'storex'<br /><br />clr.AddReference(GTKSHARP)<br />clr.AddReference(PANGO)<br /><br /># C# module compiled for this project.<br /># Problems with Gtk.ListStore in IronPython.<br />clr.AddReference(STOREX)<br /><br />import Gtk<br />import Pango<br /><br />import storex<br /><br />TITLE = 'Gtk.TreeView Demo (Drillholes)'<br />MARKUP = '<span font="Courier New" size="14" weight="bold">{:s}</span>'<br />MARKEDUPTITLE = MARKUP.format(TITLE)<br /><br />CENTERED = 0.5<br />RIGHT = 1.0<br /><br />WINDOWWIDTH = 350<br /><br />COURFONTREGULAR = 'Courier New 12'<br />COURFONTBOLD = 'Courier New Bold 12'<br /><br />DHNAME = 'DH_CBTBNZA-{:>02d}'<br />DHNAMELABEL = 'drillhole'<br />FROM = 'from'<br />TO = 'to'<br />ASSAY1 = 'assay1'<br />ASSAY2 = 'assay2'<br />ASSAY3 = 'assay3'<br />ASSAY4 = 'assay4'<br /><br />FP1FMT = '{:>5.1f}'<br />FP2FMT = '{:>4.2f}'<br /><br />DHDATAX = {(DHNAME.format(1), 0.0):{TO:8.7,<br /> ASSAY1:22.27,<br /> ASSAY2:4.93,<br /> ASSAY3:18.75,<br /> ASSAY4:35.18},<br /> (DHNAME.format(1), 8.7):{TO:15.3,<br /> ASSAY1:0.27,<br /> ASSAY2:0.09,<br /> ASSAY3:0.03,<br /> ASSAY4:0.22},<br /> (DHNAME.format(1), 15.3):{TO:25.3,<br /> ASSAY1:2.56,<br /> ASSAY2:11.34,<br /> ASSAY3:0.19,<br /> ASSAY4:13.46},<br /> (DHNAME.format(2), 0.0):{TO:10.0,<br /> ASSAY1:0.07,<br /> ASSAY2:1.23,<br /> ASSAY3:4.78,<br /> ASSAY4:5.13},<br /> (DHNAME.format(2), 10.0):{TO:20.0,<br /> ASSAY1:44.88,<br /> ASSAY2:12.97,<br /> ASSAY3:0.19,<br /> ASSAY4:0.03}}<br /><br />FIELDS = [DHNAMELABEL, FROM, TO, ASSAY1, ASSAY2, ASSAY3, ASSAY4]<br />BOLDEDCOLUMNS = [DHNAMELABEL, FROM, TO] <br />NONKEYFIELDS = FIELDS[2:]<br /><br />BLAZINGCUTOFF = 10.0<br /><br />def genericfloatformat(floatfmt, index):<br /> """<br /> For cell formatting in Gtk.TreeView.<br /><br /> Returns a function to format floats<br /> and to format floats' foreground color<br /> based on cutoff value.<br /><br /> floatfmt is a format string.<br /><br /> index is an int that indicates the<br /> column being formatted.<br /> """<br /> def setfloatfmt(treeviewcolumn, cellrenderer, treemodel, treeiter):<br /> cellrenderer.Text = floatfmt.format(treemodel.GetValue(treeiter, index))<br /> # If it is one of the assay value columns.<br /> # XXX - not generic.<br /> if index > 2:<br /> if treemodel.GetValue(treeiter, index) > BLAZINGCUTOFF:<br /> cellrenderer.Foreground = 'red'<br /> else:<br /> cellrenderer.Foreground = 'black'<br /> return Gtk.TreeCellDataFunc(setfloatfmt)<br /><br />class TreeViewTest(object):<br /> def __init__(self):<br /> Gtk.Application.Init()<br /> self.window = Gtk.Window('')<br /> # DeleteEvent - copied from Gtk demo on internet.<br /> self.window.DeleteEvent += self.DeleteEvent<br /> # Frame property provides a frame and title.<br /> self.frame = Gtk.Frame(MARKEDUPTITLE)<br /> self.tree = Gtk.TreeView()<br /> self.tree.EnableGridLines = Gtk.TreeViewGridLines.Both<br /> self.frame.Add(self.tree)<br /><br /> # Fonts for formatting.<br /> self.fdregular = Pango.FontDescription.FromString(COURFONTREGULAR)<br /> self.fdbold = Pango.FontDescription.FromString(COURFONTBOLD)<br /><br /> # C# module<br /> self.store = storex().drillhole<br /><br /> self.makecolumns()<br /> self.adddata()<br /> self.tree.Model = self.store<br /><br /> self.formatcolumns()<br /> self.formatcells()<br /> self.prettyup()<br /><br /> self.window.Add(self.frame)<br /> self.window.ShowAll()<br /> # Keep text viewable - size no smaller than intended.<br /> self.window.AllowShrink = False<br /> # XXX - hack to keep lack of gridlines on edges of<br /> # table from showing.<br /> self.window.AllowGrow = False<br /> # Unselect everything for this demo.<br /> self.tree.Selection.UnselectAll()<br /> Gtk.Application.Run()<br /><br /> def makecolumns(self):<br /> """<br /> Fill in columns for TreeView.<br /> """<br /> self.columns = {}<br /> for fieldx in FIELDS:<br /> self.columns[fieldx] = Gtk.TreeViewColumn()<br /> self.columns[fieldx].Title = fieldx<br /> self.tree.AppendColumn(self.columns[fieldx])<br /><br /> def formatcolumns(self):<br /> """<br /> Make custom labels for columnn headers.<br /><br /> Get each column properly justified (all<br /> are right justified,floating point numbers<br /> except for the drillhole 'number' - <br /> actually a string).<br /> """<br /> self.customlabels = {}<br /><br /> for fieldx in FIELDS:<br /> # This centers the labels at the top.<br /> self.columns[fieldx].Alignment = CENTERED<br /> self.customlabels[fieldx] = Gtk.Label(self.columns[fieldx].Title)<br /> self.customlabels[fieldx].ModifyFont(self.fdbold)<br /> # 120 is about right for from, to, and assay columns.<br /> self.columns[fieldx].MinWidth = 120<br /> self.customlabels[fieldx].ShowAll()<br /> self.columns[fieldx].Widget = self.customlabels[fieldx]<br /> # ShowAll required for new label to take.<br /> self.columns[fieldx].Widget.ShowAll()<br /><br /> def formatcells(self):<br /> """<br /> Add and format cell renderers.<br /> """<br /> self.cellrenderers = {}<br /><br /> for fieldx in FIELDS:<br /> self.cellrenderers[fieldx] = Gtk.CellRendererText()<br /> self.columns[fieldx].PackStart(self.cellrenderers[fieldx], True)<br /> # Drillhole 'number' (string)<br /> if fieldx == FIELDS[0]:<br /> self.cellrenderers[fieldx].Xalign = CENTERED<br /> self.columns[fieldx].AddAttribute(self.cellrenderers[fieldx], <br /> 'text', 0)<br /> else:<br /> self.cellrenderers[fieldx].Xalign = RIGHT<br /> try:<br /> self.columns[fieldx].AddAttribute(self.cellrenderers[fieldx], <br /> 'text', FIELDS.index(fieldx))<br /> except ValueError:<br /> print('\n\nProblem with field definitions; field not found.\n\n')<br /> for fieldx in BOLDEDCOLUMNS:<br /> self.cellrenderers[fieldx].Font = COURFONTBOLD<br /> self.columns[fieldx].Widget.ShowAll()<br /><br /> # XXX - not very generic, but better than doing them one by one.<br /> # from, to columns.<br /> for x in xrange(1, 3):<br /> self.columns[FIELDS[x]].SetCellDataFunc(self.cellrenderers[FIELDS[x]],<br /> genericfloatformat(FP1FMT, x))<br /> # assay<x> columns.<br /> for x in xrange(3, 7):<br /> self.columns[FIELDS[x]].SetCellDataFunc(self.cellrenderers[FIELDS[x]],<br /> genericfloatformat(FP2FMT, x))<br /><br /> def usemarkup(self):<br /> """<br /> Refreshes UseMarkup property on widgets (labels)<br /> so that they display properly and without <br /> markup text.<br /> """<br /> # Have to refresh this property each time.<br /> self.frame.LabelWidget.UseMarkup = True<br /><br /> def prettyup(self):<br /> """<br /> Get Gtk objects looking the way we<br /> intended.<br /> """<br /> # Try to get Courier New on treeview.<br /> self.tree.ModifyFont(self.fdregular)<br /> # Get rid of line.<br /> self.frame.Shadow = Gtk.ShadowType.None<br /> self.usemarkup()<br /><br /> def adddata(self):<br /> """<br /> Put data into store.<br /> """<br /> # XXX - difficulty figuring out sorting<br /> # function for TreeView. Hack it<br /> # with dictionary here.<br /> keytuples = [key for key in DHDATAX]<br /> keytuples.sort()<br /> datax = []<br /> for tuplex in keytuples:<br /> # XXX - side effect comprehension.<br /> # Not great for readability,<br /> # but compact.<br /> [datax.append(x) for x in tuplex]<br /> for fieldx in NONKEYFIELDS:<br /> datax.append(DHDATAX[tuplex][fieldx])<br /> self.store.AppendValues(*datax)<br /> # Reinitiialize data row list.<br /> datax = []<br /><br /> def DeleteEvent(self, widget, event):<br /> Gtk.Application.Quit()<br /><br />if __name__ == '__main__':<br /> TreeViewTest()</span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;">Thanks for stopping by.</span> </span><br />
Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com0tag:blogger.com,1999:blog-524230429673765509.post-68233956333895075872014-10-30T23:25:00.000-07:002014-10-31T00:16:11.228-07:00Mono gtk-sharp IronPython CalendarViewA number of years ago I did a <a href="http://www.ironpython.info/index.php?title=MonthCalendar_Control_and_datetime" target="_blank">post on the IronPython Cookbook site about the Windows.Forms Calendar control</a>. I could never get the thing to render nicely on *nix operating systems (BSD family). It sounds as though Windows.Forms development for mono (and in general) is kind of dead, so there is not much hope that solution/example will ever render nicely on *nix. Recently I've been playing with mono and decided to give gtk-sharp a shot with IronPython.<br />
<br />
Quick disclaimers:<br />
<br />
1) I suspect from the examples I've seen on the internet that PyGtk is a little easier to deal with than gtk-sharp. That's OK; I wanted to use IronPython and have the rest of the mono/dotNet framework available, so I went through the extra trouble to forego CPython and PyGtk and go with IronPython and gtk-sharp instead.<br />
<br />
2) The desktop is not the most cutting edge or sexy platform in 2014. Nonetheless, where I work it is alive and well. When I no longer see engineers hacking solutions in Excel and VBA, I'll consider the possibility of outliving the desktop. Right now I'm not hopeful :-\<br />
<br />
The results aren't bad, at least as far as rendering goes. I couldn't get the Courier font to take on OpenBSD, but the Gtk Calendar control looks acceptable. All in all, I was OK with the results on both Windows and OpenBSD. I've heard Gtk doesn't do quite as well on Apple products, but I don't own a Mac to test with. Here are a couple screenshots:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNCw9xQmD9k8Rb5xmGPfERnZX7eMtCuDbQHxq0jYke9SN0FuutKCnNBRzX04CC10H1auzkRbYnXsihKKxy6_IJXMctMUuoCvGAODFtqB5piSDr7h82_h59rUmN7R_ytqudvKgorzarg-Y/s1600/ipycalmono.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNCw9xQmD9k8Rb5xmGPfERnZX7eMtCuDbQHxq0jYke9SN0FuutKCnNBRzX04CC10H1auzkRbYnXsihKKxy6_IJXMctMUuoCvGAODFtqB5piSDr7h82_h59rUmN7R_ytqudvKgorzarg-Y/s1600/ipycalmono.PNG" height="136" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2bZZ9TfaziZVzWmgK4gEW0uL34tajKi1g2G_Vy6G7TO5HPa6uPL_KS_jz28eeCvj-SS8XSK54UgjOFcG_RXYSt1_DBQhoUkIo2LAOhPs_cSPdCX7T5umvwlvU7uL66o0aRFgemPeBTws/s1600/openbsdcal.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2bZZ9TfaziZVzWmgK4gEW0uL34tajKi1g2G_Vy6G7TO5HPa6uPL_KS_jz28eeCvj-SS8XSK54UgjOFcG_RXYSt1_DBQhoUkIo2LAOhPs_cSPdCX7T5umvwlvU7uL66o0aRFgemPeBTws/s1600/openbsdcal.png" height="198" width="320" /></a></div>
<br />
<br />
<br />
<br />
<br />
I run the cwm window manager on OpenBSD and have it set up to cut out borders on windows, hence the more minimalist look to the control there.<br />
<br />
IronPython output on *nix has always come out in yellow or white - it doesn't show up on a white background, which I prefer. In order to get around this, I run an xterm with a black background:<br />
<br />
<span style="font-family: "Courier New", Courier, monospace;">xterm -bg black -fg white</span><br />
<br />
<span style="font-family: "Courier New", Courier, monospace;"><span style="font-family: inherit;">Here is the code for the gtk-sharp Gtk.Calendar control:</span></span><br />
<br />
<span style="font-family: "Courier New", Courier, monospace;"><span style="font-family: inherit;"><span style="font-family: "Courier New",Courier,monospace;">#!/usr/local/bin/mono /home/carl/IronPython-2.7.4/ipy64.exe<br /><br />import clr<br /><br />GTKSHARP = 'gtk-sharp'<br />PANGO = 'pango-sharp'<br /><br />clr.AddReference(GTKSHARP)<br />clr.AddReference(PANGO)<br /><br />import Gtk<br />import Pango<br /><br />import datetime<br /><br />TITLE = 'Gtk.Calendar Demo'<br />MARKUP = '<span font="Courier New" size="14" weight="bold">{:s}</span>'<br />MARKEDUPTITLE = MARKUP.format(TITLE)<br /><br />INFOMSG = '<span font="Courier New 12">\n\n Program set to run for:\n\n '<br />INFOMSG += '{:%Y-%m-%d}\n\n</span>'<br /><br />DATEDIFFMSG = '<span font="Courier New 12">\n\n '<br />DATEDIFFMSG += 'There are {0:d} days between the\n'<br />DATEDIFFMSG += ' beginning of the epoch and\n'<br />DATEDIFFMSG += ' {1:%Y-%m-%d}.\n\n</span>'<br /><br />ALIGNMENTPARAMS = (0.0, 0.5, 0.0, 0.0)<br /><br />WINDOWWIDTH = 350<br /><br />CALENDARFONT = 'Courier New Bold 12'</span><br /><br /><span style="font-family: "Courier New",Courier,monospace;">class CalendarTest(object):<br /> inthebeginning = datetime.datetime.fromtimestamp(0)<br /> # Debug info - make sure beginning of epoch really<br /> # is +midnight, Jan 1, 1970 GMT.<br /> print(inthebeginning)<br /> def __init__(self):<br /> Gtk.Application.Init()<br /> self.window = Gtk.Window(TITLE)<br /> # DeleteEvent - copied from Gtk demo on internet.<br /> self.window.DeleteEvent += self.DeleteEvent<br /> # Frame property provides a frame and title.<br /> self.frame = Gtk.Frame(MARKEDUPTITLE)<br /> self.calendar = Gtk.Calendar()<br /> # Handles date selection event.<br /> self.calendar.DaySelected += self.dateselect<br /> # Sets up text for labels.<br /> self.getcaltext()<br /> # Puts little box around text.<br /> self.datelabelframe = Gtk.Frame()<br /> # Try to get datelabel to align with other label.<br /> self.datelabelalignment = Gtk.Alignment(*ALIGNMENTPARAMS)<br /> self.datelabel = Gtk.Label(self.caltext)<br /> self.datelabelalignment.Add(self.datelabel)<br /> self.datelabelframe.Add(self.datelabelalignment)<br /> # Puts little box around text.<br /> self.datedifflabelframe = Gtk.Frame()<br /> self.datedifflabelalignment = Gtk.Alignment(*ALIGNMENTPARAMS)<br /> self.datedifflabel = Gtk.Label(self.timedifftext)<br /> self.datedifflabelalignment.Add(self.datedifflabel)<br /> self.datedifflabelframe.Add(self.datedifflabelalignment)<br /> self.vbox = Gtk.VBox()<br /> self.vbox.PackStart(self.datelabelframe)<br /> self.vbox.PackStart(self.datedifflabelframe)<br /> self.vbox.PackStart(self.calendar)<br /> self.frame.Add(self.vbox)<br /> self.window.Add(self.frame)<br /> self.prettyup()<br /> self.window.ShowAll()<br /> # Keep text viewable - size no smaller than intended.<br /> self.window.AllowShrink = False<br /> Gtk.Application.Run()<br /><br /> def getcaltext(self):<br /> """<br /> Get messages for run date.<br /> """<br /> # Calendar month is 0 based.<br /> yearmonthday = self.calendar.Year, self.calendar.Month + 1, self.calendar.Day<br /> chosendate = datetime.datetime(*yearmonthday)<br /> self.caltext = INFOMSG.format(chosendate)<br /> # For reporting of number of days since beginning of epoch.<br /> timediff = chosendate - CalendarTest.inthebeginning<br /> self.timedifftext = DATEDIFFMSG.format(timediff.days, chosendate)<br /><br /> def usemarkup(self):<br /> """<br /> Refreshes UseMarkup property on widgets (labels)<br /> so that they display properly and without <br /> markup text.<br /> """<br /> # Have to refresh this property each time.<br /> self.frame.LabelWidget.UseMarkup = True<br /> self.datelabel.UseMarkup = True<br /> self.datedifflabel.UseMarkup = True<br /><br /> def prettyup(self):<br /> """<br /> Get Gtk objects looking the way we<br /> intended.<br /> """<br /> # Try to make frame wider.<br /> # XXX<br /> # Works nicely on Windows - try on Unix.<br /> # Allows bold, etc.<br /> self.usemarkup()<br /> self.frame.SetSizeRequest(WINDOWWIDTH, -1)<br /> # Get rid of line in middle of text on title.<br /> self.frame.Shadow = Gtk.ShadowType.None<br /> # Try to get Courier New on calendar.<br /> fd = Pango.FontDescription.FromString(CALENDARFONT)<br /> self.calendar.ModifyFont(fd)<br /> self.datelabel.Justify = Gtk.Justification.Left<br /> self.datedifflabel.Justify = Gtk.Justification.Left<br /> self.window.Title = ''<br /> self.usemarkup()<br /><br /> def dateselect(self, widget, event):<br /> self.getcaltext()<br /> self.datelabel.Text = self.caltext<br /> self.datedifflabel.Text = self.timedifftext<br /> self.prettyup()<br /><br /> def DeleteEvent(self, widget, event):<br /> Gtk.Application.Quit()<br /><br />if __name__ == '__main__':<br /> CalendarTest()</span></span></span><br />
<br />
<span style="font-family: "Courier New", Courier, monospace;"><span style="font-family: inherit;"><span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;">Thanks for stopping by.</span> </span></span> </span><br />
<br />
Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com0tag:blogger.com,1999:blog-524230429673765509.post-65426089010210323152014-10-20T14:11:00.002-07:002014-10-20T14:11:32.639-07:00subprocess.Popen() or Abusing a Home-grown Windows ExecutableEach month I redo 3D block model interpolations for a series of open pits at a distant mine. Those of you who follow my twitter feed often see me tweet, "The 3D geologic block model interpolation chuggeth . . ." What's going on is that I've got all the processing power maxed out dealing with millions of model blocks and thousands of data points. The machine heats up and with the fan sounds like a DC-9 warming up before flight.<br /><br />All that said, running everything roughly in parallel is more efficient time-wise than running it sequentially. An hour of chugging is better than four. The way I've been doing this is using the Python (2.7) subprocess module's Popen method, running my five interpolated values in parallel. Our Python programmer Lori originally wrote this to run in sequence for a different set of problems. I bastardized it for my own.<br />
<br />
The subprocess part of the code is relatively straightforward. Function startprocess() in my code covers that.<br /><br />What makes this problem a little more challenging:<br />
<br />
1) it's a vendor supplied executable we're dealing with . . . without an API or source . . . that's interactive (you can't feed it the config file path; it asks for it). This results in a number of time.sleep() and <process>.stdin.write() calls that can be brittle.<br />
<br />
2) getting the processes started, as I just mentioned, is easy. Finding out when to stop, or kill them, requires knowledge of the app and how it generates output. I've gone for an ugly, but effective check of report file contents.<br />
<br />
3) while waiting for the processes to finish their work, I need to know things are working and what's going on. I've accomplished this by reporting the data files' sizes in MB.<br />
<br />
4) the executable isn't designed for a centralized code base (typically all scripts are kept in a folder for the specific project or pit), so it only allows about 100 character columns in the file paths sent to it. I've omitted this from my sanitized version of the code, but it made things even messier than they are below. Also, I don't know if all Windows programs do this, but the paths need to be inside quotes - the path kept breaking on the colon (:) when not quoted.<br />
<br />
Basically, this is a fairly ugly problem and a script that requires babysitting while it runs. That's OK; it beats the alternative (running it sequentially while watching each run). I've tried to adhere to DRY (don't repeat yourself) as much as possible, but I suspect this could be improved upon.<br />
<br />
The reason why I blog it is that I suspect there are other people out there who have to do the same sort of thing with their data. It doesn't have to be a mining problem. It can be anything that requires intensive computation across voluminous data with an executable not designed with a Python API.<br />
<br />
Notes: <br />
<br />
1) I've omitted the file multirunparameters.py that's in an import statement. It has a bunch of paths and names that are relevant to my project, but not to the reader's programming needs.<br />
<br />
2) python 2.7 is listed at the top of the file as "mpython." This is the Python that our mine planning vendor ships that ties into their quite capable Python API. The executable I call with subprocess.Popen() is a Windows executable provided by a consultant independent of the mine planning vendor. It just makes sense to package this interpolation inside the mine planning vendor's multirun (~ batch file) framework as part of an overall working of the 3D geologic block model. The script exits as soon as this part of the batch is complete. I've inserted a 10 second pause at the end just to allow a quick look before it disappears.<br />
<br />
<span style="font-family: "Courier New", Courier, monospace;">#!C:/MineSight/x64/mpython</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">"""<br />Interpolate grades with <consultant> program<br />from text files.<br />"""</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">import argparse</span><br />
<span style="font-family: "Courier New", Courier, monospace;">import subprocess as subx<br />import os<br />import collections as colx</span><br />
<span style="font-family: "Courier New", Courier, monospace;">import time<br />from datetime import datetime as dt</span><br />
<br />
<span style="font-family: "Courier New", Courier, monospace;"># Lookup file of constants, pit names, assay names, paths, etc.<br />import multirunparameters as paramsx</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">parser = argparse.ArgumentParser()<br /># 4 letter argument like 'kwat'<br /># Feed in at command line.<br />parser.add_argument('pit', help='four letter, lower case pit abbreviation (kwat)', type=str)<br />args = parser.parse_args()<br />PIT = args.pit</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">pitdir = paramsx.PATHS[PIT]<br />pathx = paramsx.BASEPATH.format(pitdir)<br />controlfilepathx = paramsx.CONTROLFILEPATH.format(pitdir)</span><br />
<br />
<span style="font-family: "Courier New", Courier, monospace;">timestart = dt.now()</span><span style="font-family: "Courier New", Courier, monospace;"><br />print(timestart)</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">PROGRAM = 'C:/MSPROJECTS/EOMReconciliation/2014/Multirun/AllPits/consultantprogram.exe'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">ENDTEXT = 'END <consultant> REPORT'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;"># These names are the only real difference between pits.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"># Double quote is for subprocess.Popen object's stdin.write method<br /># - Windows path breaks on colon without quotes.<br />ASSAY1DRIVER = 'KDriverASSAY1{:s}CBT.csv"'.format(PIT)<br />ASSAY2DRIVER = 'KDriverASSAY2{:s}CBT.csv"'.format(PIT)<br />ASSAY3DRIVER = 'KDriverASSAY3_{:s}CBT.csv"'.format(PIT)<br />ASSAY4DRIVER = 'KDriverASSAY4_{:s}CBT.csv"'.format(PIT)<br />ASSAY5DRIVER = 'KDriverASSAY5_{:s}CBT.csv"'.format(PIT)</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">RETCHAR = '\n'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">ASSAY1 = 'ASSAY1'<br />ASSAY2 = 'ASSAY2'<br />ASSAY3 = 'ASSAY3'<br />ASSAY4 = 'ASSAY4'<br />ASSAY5 = 'ASSAY5'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">NAME = 'name'<br />DRFILE = 'driver file'<br />OUTPUT = 'output'<br />DATFILE = 'data file'<br />RPTFILE = 'report file'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;"># data, report files<br />ASSAY1K = 'ASSAY1K.csv'<br />ASSAY1RPT = 'ASSAY1.RPT'</span><br />
<span style="font-family: "Courier New", Courier, monospace;">ASSAY2K = 'ASSAY2K.csv'<br />ASSAY2RPT = 'ASSAY2.RPT'</span><br />
<span style="font-family: "Courier New", Courier, monospace;">ASSAY3K = 'ASSAY3K.csv'<br />ASSAY3RPT = 'ASSAY3.RPT'</span><br />
<span style="font-family: "Courier New", Courier, monospace;">ASSAY4K = 'ASSAY4K.csv'<br />ASSAY4RPT = 'ASSAY4.RPT'</span><br />
<span style="font-family: "Courier New", Courier, monospace;">ASSAY5K = 'ASSAY5K.csv'<br />ASSAY5RPT = 'ASSAY5.RPT'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">OUTPUTFMT = '{:s}output.txt'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">ASSAYS = {1:{NAME:ASSAY1,<br /> DRFILE:controlfilepathx + ASSAY1DRIVER,<br /> OUTPUT:pathx + OUTPUTFMT.format(ASSAY1),<br /> DATFILE:pathx + ASSAY1K,<br /> RPTFILE:pathx + ASSAY1RPT},<br /> 2:{NAME:ASSAY2,<br /> DRFILE:controlfilepathx + ASSAY2DRIVER,<br /> OUTPUT:pathx + OUTPUTFMT.format(ASSAY2),<br /> DATFILE:pathx + ASSAY2K,<br /> RPTFILE:pathx + ASSAY2RPT},<br /> 3:{NAME:ASSAY3,<br /> DRFILE:controlfilepathx + ASSAY3DRIVER,<br /> OUTPUT:pathx + OUTPUTFMT.format(ASSAY3),<br /> DATFILE:pathx + ASSAY3K,<br /> RPTFILE:pathx + ASSAY3RPT},<br /> 4:{NAME:ASSAY4,<br /> DRFILE:controlfilepathx + ASSAY4DRIVER,<br /> OUTPUT:pathx + OUTPUTFMT.format(ASSAY4),<br /> DATFILE:pathx + ASSAY4K,<br /> RPTFILE:pathx + ASSAY4RPT},<br /> 5:{NAME:ASSAY5,<br /> DRFILE:controlfilepathx + ASSAY5DRIVER,<br /> OUTPUT:pathx + OUTPUTFMT.format(ASSAY5),<br /> DATFILE:pathx + ASSAY5K,<br /> RPTFILE:pathx + ASSAY5RPT}}</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">DELFILE = 'delete file'<br />INTERP = 'interp'<br />SLEEP = 'sleep'<br />MSGDRIVER = 'message driver'<br />MSGRETCHAR = 'message return character'<br />FINISHED1 = 'finished one assay'<br />FINISHEDALL = 'finished all interpolations'<br />TIMEELAPSED = 'time elapsed'<br />FILEEXISTS = 'report file exists'<br />DATSIZE = 'data file size'<br />DONE = 'number interpolations finished'<br />DATFILEEXIST = 'data file not yet there'<br />SIZECHANGE = 'report file changed size'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;"># for converting to megabyte file size from os.stat()<br />BITSHIFT = 20</span><br />
<span style="font-family: "Courier New", Courier, monospace;"># sleeptime - 5 seconds<br />SLEEPTIME = 5</span><br />
<span style="font-family: "Courier New", Courier, monospace;">FINISHED = 'finished'</span><br />
<span style="font-family: "Courier New", Courier, monospace;">RPTFILECHSIZE = """<br /> <br />Report file for {:s}<br />changed size; killing process . . .</span><br />
<span style="font-family: "Courier New", Courier, monospace;">"""</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">MESGS = {DELFILE:'\n\nDeleting {} . . .\n\n',<br /> INTERP:'\n\nInterpolating {:s} . . .\n\n',<br /> SLEEP:'\nSleeping 2 seconds . . .\n\n',<br /> MSGDRIVER:'\n\nWriting driver file name to stdin . . .\n\n',<br /> MSGRETCHAR:'\n\nWriting retchar to stdin for {:s} . . .\n\n',<br /> FINISHED1:'\n\nFinished {:s}\n\n',<br /> FINISHEDALL:'\n\nFinished interpolation.\n\n',<br /> TIMEELAPSED:'\n\n{:d} elapsed seconds\n\n',<br /> FILEEXISTS:'\n\nReport file for {:s} exists . . .\n\n',<br /> DATSIZE:'\n\nData file size for {:s} is now {:d}MB . . .\n\n',<br /> DONE:'\n\n{:d} out of {:d} assays are finished . . .\n\n',<br /> DATFILEEXIST:"\n\n{:s} doesn't exist yet . . .\n\n",<br /> SIZECHANGE:RPTFILECHSIZE}</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">def cleanslate():<br /> """<br /> Delete all output files prior to interpolation<br /> so that their existence can be tracked.<br /> """<br /> for key in ASSAYS:<br /> files = (ASSAYS[key][DATFILE],<br /> ASSAYS[key][RPTFILE],<br /> ASSAYS[key][OUTPUT])<br /> for filex in files:<br /> print(MESGS[DELFILE].format(filex))<br /> if os.path.exists(filex) and os.path.isfile(filex):<br /> os.remove(filex)<br /> return 0</span><br />
<br />
<span style="font-family: "Courier New", Courier, monospace;">def startprocess(assay):<br /> """<br /> Start <consultant program> run for given interpolation.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> Return subprocess.Popen object,<br /> file object (output file).<br /> """<br /> print(MESGS[INTERP].format(ASSAYS[assay][NAME]))<br /> # XXX - I hate time.sleep - hack<br /> # XXX - try to re-route standard output so that<br /> # it's not all jumbled together.<br /> print(MESGS[SLEEP])<br /> time.sleep(2)<br /> # output file for stdout<br /> f = open(ASSAYS[assay][OUTPUT], 'w')<br /> procx = subx.Popen('{0}'.format(PROGRAM), stdin=subx.PIPE, stdout=f)<br /> print(MESGS[SLEEP])<br /> time.sleep(2)<br /> # XXX - problem, starting up Excel CBT 22JUN2014<br /> # Ah - this is what happens when the <software usb licence><br /> # key is not attached :-(<br /> print(MESGS[MSGDRIVER])<br /> print('\ndriver file = {:s}\n'.format(ASSAYS[assay][DRFILE]))<br /> procx.stdin.write(ASSAYS[assay][DRFILE])<br /> print(MESGS[SLEEP])<br /> time.sleep(2)<br /> # XXX - this is so jacked up -<br /> # no idea what is happening when<br /> print(MESGS[MSGRETCHAR].format(ASSAYS[assay][NAME]))<br /> procx.stdin.write(RETCHAR)<br /> print(MESGS[SLEEP])<br /> time.sleep(2)<br /> print(MESGS[MSGRETCHAR].format(ASSAYS[assay][NAME]))<br /> procx.stdin.write(RETCHAR)<br /> print(MESGS[SLEEP])<br /> time.sleep(2)<br /> return procx, f</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">def crosslookup(assay):<br /> """<br /> From assay string, get numeric<br /> key for ASSAYS dictionary.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> Returns integer.<br /> """<br /> for key in ASSAYS:<br /> if assay == ASSAYS[key][NAME]:<br /> return key<br /> return 0</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">def checkprocess(assay, assaydict):<br /> """<br /> Check to see if assay<br /> interpolation is finished.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> assay is the item in question<br /> (ASSAY1, ASSAY2, etc.).</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> assaydict is the operating dictionary<br /> for the assay in question.</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> Returns True if finished.<br /> """<br /> # Report file indicates process finished.<br /> assaykey = crosslookup(assay)<br /> rptfile = ASSAYS[assaykey][RPTFILE]<br /> datfile = ASSAYS[assaykey][DATFILE]<br /> if os.path.exists(datfile) and os.path.isfile(datfile):<br /> # Report size of file in MB.<br /> datfilesize = os.stat(datfile).st_size >> BITSHIFT<br /> print(MESGS[DATSIZE].format(assay, datfilesize))<br /> else:<br /> # Doesn't exist yet.<br /> print(MESGS[DATFILEEXIST].format(datfile))<br /> if os.path.exists(rptfile) and os.path.isfile(rptfile):<br /> # XXX - not the most efficient way,<br /> # but this checking the file appears<br /> # to work best.<br /> f = open(rptfile, 'r')<br /> txt = f.read()<br /> f.close()<br /> # XXX - hack - gah.<br /> if txt.find(ENDTEXT) > -1:<br /> # looking for change in reportfile size<br /> # or big report file<br /> print(MESGS[SIZECHANGE].format(assay))<br /> print(MESGS[SLEEP])<br /> time.sleep(2)<br /> return True<br /> return False</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">PROCX = 'process'<br />OUTPUTFILE = 'output file'</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;"># Keeps track of files and progress of <consultant program>.<br />opdict = colx.OrderedDict()</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;"># get rid of preexisting files<br />cleanslate()</span><br />
<span style="font-family: "Courier New", Courier, monospace;"><br /># start all five roughly in parallel<br /># ASSAYS keys are numbers<br />for key in ASSAYS:<br /> # opdict - ordered with assay names as keys<br /> namex = ASSAYS[key][NAME]<br /> opdict[namex] = {}<br /> assaydict = opdict[namex]<br /> assaydict[PROCX], assaydict[OUTPUTFILE] = startprocess(key)<br /> # Initialize active status of process.<br /> assaydict[FINISHED] = False</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;"># For count.<br />numassays = len(ASSAYS)<br /># Loop until all finished.<br />while True:<br /> # Cycle until done then break.<br /> # Sleep SLEEPTIME seconds at a time and check between.<br /> time.sleep(SLEEPTIME)<br /> # Count.<br /> i = 0<br /> for key in opdict:<br /> assaydict = opdict[key]<br /> if not assaydict[FINISHED]:<br /> status = checkprocess(key, assaydict)<br /> if status:<br /> # kill process when report file changes<br /> opdict[key][PROCX].kill()<br /> assaydict[FINISHED] = True<br /> i += 1<br /> else:<br /> i += 1<br /> print(MESGS[DONE].format(i, numassays))<br /> # all done<br /> if i == numassays:<br /> break</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">print('\n\nFinished interpolation.\n\n')<br />timeend = dt.now()<br />elapsed = timeend - timestart</span><br />
<span style="font-family: "Courier New", Courier, monospace;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;">print(MESGS[TIMEELAPSED].format(elapsed.seconds))<br />print('\n\n{:d} elapsed minutes\n\n'.format(elapsed.seconds/60))</span><br />
<span style="font-family: Courier New;"></span><br />
<span style="font-family: "Courier New", Courier, monospace;"># Allow quick look at screen.<br />time.sleep(10)</span><br />
<br />
<br />
Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com2tag:blogger.com,1999:blog-524230429673765509.post-80748217723146849602014-10-12T23:42:00.001-07:002014-10-13T14:21:21.969-07:00Downloading a Bunch of MP3's off the Internet (Foreign Language Tapes)A mining bud Jen wrote a <a href="http://adventuresincongo.wordpress.com/2014/10/12/how-the-brain-is-badly-wired/" target="_blank">blog post</a> lamenting the difficulty of learning a foreign language as an adult in a far off land. This inspired me to clean up my "download the Foreign Service Institute" French "tapes" (mp3's, actually) script I wrote for myself and publish it.<br />
<br />
I'm not very astute on web programming. This script came out of necessity. There may be other, more efficient ways to do this. If you have a slow connection a piecemeal approach will probably be required. It took about 20 minutes to get all these files over a decent Verizon MIFI unit connection (I, unfortunately, don't have speed metrics available).<br />
<br />
<b>Notes about the downloaded product:</b> the US State Department's language tapes and lessons were mostly written and produced 30 to 50 years ago. It's not Rosetta Stone, but I have found them to have value when it comes to practicing pronunciation, including cadence and rhythm of the foreign language - things you just can't get from printed or displayed text.<br />
<br />
My late wife gifted me some Spanish tapes prior to the internet age that helped me out. I am by no means fluent in Spanish, but I can say<span id="result_box" lang="es"><i><span class="hps"> Hacemos lo</span><span class="hps"> que podemos</span> <span class="hps">hasta que nos</span> </i><span class="hps"><i>boten</i> (this may not be entirely grammatically correct) to the Spanish speaking mining engineers and get a laugh.</span></span><br />
<div class="almost_half_cell" id="gt-res-content">
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps"></span></span><br /></div>
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps"></span></span><br /></div>
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps"></span></span><br /></div>
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps"><br />The original names of the mp3's are unnecessarily long and have the appearance of having been created by the Department of Redundancy Department. It's a government thing, but it does not reflect on the quality of the product. While the tapes at times are socialogically and technologically dated in their subject matter, the foreign languages haven't changed all that much.</span></span></div>
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps"></span></span><br /></div>
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps"></span></span><br /></div>
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps"><br /><b>The script: </b>I used Python 3.4 with the urllib module's request method. The main challenge was getting the url's of the mp3's right. The names are not entirely consistent. For help with this (I am using Firefox 24.3.0 on OpenBSD 5.4), I right clicked on the mp3's link and selected Inspect Element from the drop down menu:</span></span></div>
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps"><br /></span></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEityOsfab5T1TGajFIHhbWZMXHL0XBLsTW9ZV6CkZ4N_9YxMKdF5bdKaEg0OC2u8KwWvhsJ02SyDhTho0_fQU8bIsrSG4DeiyoLF-66AkVF6G388XPF7BEmGOH1Kx0yvbjliPoiVkmWl7k/s1600/inspectelement.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEityOsfab5T1TGajFIHhbWZMXHL0XBLsTW9ZV6CkZ4N_9YxMKdF5bdKaEg0OC2u8KwWvhsJ02SyDhTho0_fQU8bIsrSG4DeiyoLF-66AkVF6G388XPF7BEmGOH1Kx0yvbjliPoiVkmWl7k/s1600/inspectelement.png" height="186" width="320" /></a></div>
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps"></span></span><br /></div>
</div>
<div class="almost_half_cell" id="gt-res-content">
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps"><br /></span></span></div>
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps">The lower left window has the href and the link to the mp3 - if your script is not able to find the file, this is a convenient place to look.<br /><br />This is the whole thing:</span></span></div>
<div dir="ltr" style="-ms-zoom: 1;">
<span style="font-family: "Courier New",Courier,monospace;"><span id="result_box" lang="es"><span class="hps"><br /></span></span></span></div>
<div dir="ltr" style="-ms-zoom: 1;">
<span style="font-family: "Courier New",Courier,monospace;"> </span><br />
<div class="almost_half_cell" id="gt-res-content">
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps"><span style="font-family: "Courier New",Courier,monospace;">#!python3.4<br /><br />from urllib import request<br /><br /># For getting foreign language study mp3's.<br /># Main part of URL for French.<br />BASEURL = 'http://www.fsi-language-courses.org/Courses/'<br />MIDDLEURLI = 'French/Basic (Revised)/Volume {volume}/'<br />MIDDLEURLII = 'French/Basic (Revised)/Volume {0:s}/'<br />BASEURLEND = 'FSI - French Basic Course (Revised) '<br /><br /># Format changes inexplicably at chapter 19.<br /># Grrrr . . .<br />URLI = BASEURL + MIDDLEURLI + BASEURLEND<br />URLI += '- Volume {volume} - Unit {unit:0>2d} '<br />URLI += '{unit:0>2d}.{section:0>2d}.mp3'<br /><br />URLII = BASEURL + MIDDLEURLII + BASEURLEND<br />URLII += '- Volume {1[volume]:d} - Unit {1[unit]:0>2d} '<br />URLII += '{1[unit]:0>2d}.{1[section]:d}.mp3'<br /><br /># Format for actual name of mp3 files.<br /># This is what I wanted for a name - your<br /># preferences may be different - adjust<br /># accordingly.<br />FILENAME = '{unit:0>2d}{section:0>2d}.mp3'<br /><br /># Texts (pdf format).<br /># Everything the State Dept. does is a 'StudentText' -<br /># fair enough.<br />STUDENTTXT = 'StudentText.pdf'<br /><br />PDFURLBASICTEXT1 = 'http://ia601400.us.archive.org/28/items/'<br />PDFURLBASICTEXT1 += 'Fsi-FrenchBasicCourserevised-StudentText/'<br />PDFURLBASICTEXT1 += 'Fsi-FrenchBasicCourserevised-Volume1-'<br /><br />PDFURLBASICTEXT2 = 'http://ia801400.us.archive.org/28/items/'<br />PDFURLBASICTEXT2 += 'Fsi-FrenchBasicCourserevised-StudentText/'<br />PDFURLBASICTEXT2 += 'Fsi-FrenchBasicCourserevised-Volume2-'<br /><br />PDFURLMONDEFR = 'http://ia600406.us.archive.org/3/items/'<br />PDFURLMONDEFR += 'Fsi-LeMondeFrancophone/Fsi-LeMondeFrancophone-'<br /><br />TWO = 'Two'<br /><br /># Tack on StudentText.pdf to end.<br />pdfs = [PDFURLBASICTEXT1, PDFURLBASICTEXT2, PDFURLMONDEFR]<br />pdfs = [pdfx + STUDENTTXT for pdfx in pdfs]<br />myfilenames = ['basictext1.pdf', 'basictext2.pdf', 'mondefrancophone.pdf']<br /># I'm using the dictionary keys for filenames.<br />pdfs = dict(zip(myfilenames, pdfs))<br /><br />VOLUME = 'volume'<br />UNIT = 'unit'<br />SECTION = 'section'<br /><br /># volume key, then list of two tuples of unit and <br /># number of sections<br />VOLUMES = {1:[(1, 6), (2, 6), (3, 6), (4, 7), (5, 7), <br /> (6, 3), (7, 11), (8, 10), (9, 11), (10, 9),<br /> (11, 9), (12, 4)],<br /> 2:[(13, 8), (14, 9), (15, 10), (16, 9), (17, 11),<br /> (18, 7), (19, 9), (20, 8), (21, 8), (22, 7),<br /> (23, 8), (24, 6)]}<br /><br />mp3s = []<br />for key in VOLUMES:<br /> for unitsection in VOLUMES[key]:<br /> for x in range(1, unitsection[1] + 1):<br /> mp3s.append({VOLUME:key, UNIT:unitsection[0], SECTION:x})<br /><br />for mp3x in mp3s:<br /> # Name format change at chapter 19 :-(<br /> if mp3x[UNIT] > 18:<br /> urlx = URLII.format(TWO, mp3x)<br /> else:<br /> urlx = URLI.format(**mp3x)<br /> filenamex = FILENAME.format(**mp3x)<br /> print('Retrieving {0} . . .'.format(urlx))<br /> request.urlretrieve(urlx, filenamex)<br /><br /># Add pdf texts at end.<br />for pdfx in pdfs:<br /> print('Retrieving {0} . . .'.format(pdfx))<br /> request.urlretrieve(pdfs[pdfx], pdfx)<br /><br />print('Everything appears to have downloaded.')<br />print('Check the directory with the files to be sure.')</span></span></span></div>
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps"><span style="font-family: "Courier New",Courier,monospace;"> </span></span></span></div>
<div dir="ltr" style="-ms-zoom: 1;">
<span id="result_box" lang="es"><span class="hps"><span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;"><span style="font-family: inherit;">As for my French efforts, I've had better luck downloading this stuff than I have learning it. Nonetheless, a quick message to Guido van Rossum and the other core devs: <i>transmettez-leur mon meilleur souvenir.</i></span></span> </span></span><span class="hps"></span><span class="hps"></span><span class="hps"></span></span></div>
</div>
</div>
</div>
Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com1tag:blogger.com,1999:blog-524230429673765509.post-42563301358741783412014-10-06T03:24:00.001-07:002014-10-06T15:58:11.368-07:00Event report: pycon.za<div class="separator" style="clear: both; text-align: center;">
</div>
I managed to squeeze in a 4 day stop in Johannesburg on a recent trip that happily coincided with pycon.za. I love pycon.us and all the other big conferences, but for value, these smaller localized cons can't be beat.<br />
<br />
Venue: The Campus, Bryanston<br />
<br />
Not your average office park. It's nicely landscaped and has a huge center beach or pitch or lawn (depending where you're from). The buildings are all named after famous sports venues like Lemans. The nod to us Yanks (NOT New York Yankees) in Wrigley Field was a nice touch.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEir4uV4pMFP5UUyNQeOkp56vkrUzKkZbIy9RiGw6w7n-FhyphenhyphenRBE1yyd5exwZ_EVS3AhQip4E5on-YtCMs3vOnCiCvtan2Qm354G9K7lN7XF_rsFEpONvxbvxtEMY-Hem1hmVSBiwQ5Wf3XE/s1600/multiplestreliziagood.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEir4uV4pMFP5UUyNQeOkp56vkrUzKkZbIy9RiGw6w7n-FhyphenhyphenRBE1yyd5exwZ_EVS3AhQip4E5on-YtCMs3vOnCiCvtan2Qm354G9K7lN7XF_rsFEpONvxbvxtEMY-Hem1hmVSBiwQ5Wf3XE/s1600/multiplestreliziagood.jpg" height="240" width="320" /></a></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEPTkU0qFuX7tScGHMLHXvy-3tG5jmFILKb3zvkS_Ha1QahoCbPSCHeBHCGgJ3AA9MR1bAtjKZJ5B76nmdFHNVat858MRrXo6ba2lxgT2NAynNE7LmLbyyEMER04vmr77rl8WJtWUPGn4/s1600/campus.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEPTkU0qFuX7tScGHMLHXvy-3tG5jmFILKb3zvkS_Ha1QahoCbPSCHeBHCGgJ3AA9MR1bAtjKZJ5B76nmdFHNVat858MRrXo6ba2lxgT2NAynNE7LmLbyyEMER04vmr77rl8WJtWUPGn4/s1600/campus.png" height="277" width="320" /></a></div>
<br />
<br />
Best of all 100MB/day of internet for all who enter. That's not ideal if you're wanting to watch Youtube videos, but plenty if you just want to check a speaker bio or do con-related stuff. I thought the organizers did a great job of keeping the con inexpensive but valuable.<br />
<br />
The catered food and drinks were really good, by my standards at least.<br />
<br />
Apart from an unfortunate plumbing problem in the men's bathroom the second day that was quickly repaired, everything went off without a hitch.<br />
<br />
Talks that I went to:<br />
<br />
<a href="https://za.pycon.org/talks/28/" target="_blank">Ludell-Doughtie Writing Python Code to Decide an Election</a> Keynote - he outlined the methodology and process they used during a recent (Libyan? - there was Arabic right-to-left text in the data) election.<br />
<br />
The main take-aways for me were<br />
<ol>
<li>Use pre-written, open source software packages to standardize things, because you won't have time to roll your own or dink with inconsistent data/code formats when you are in the thick of it. </li>
<li>It's a huge responsibility to write code for an election and manage the data, but it's a cool project.</li>
</ol>
<a href="https://za.pycon.org/talks/49/" target="_blank"> Steve Crawford Enabling Science with the Southern African Large Telescope with Python</a> Doctor Crawford didn't show a lot of code in this talk, but he did outline the architecture for getting information and moving it around. The scope of the talk was way too big for code samples, but that's OK. I left feeling . . . shall we say . . . inspired . . .
<br />
<br />
My main takeaways:
<br />
<br />
Astronomy is wickedly cool and based on instrumentation, precision, and data paucity and, ironically, an overabundance of data (on average about 10GB/day, up to 50GB/day). Crawford mentioned more than once the desperate need to "catch as many photons as possible because there are so few coming in." Yeah, photons, like particles of light, just wow.
<br />
<br />
Python is used for everything where it is appropriate to use it. There are plenty of problems that don't require you to be a genius rocket scientist like Crawford. sysadmin, data, and, perhaps most importantly, web. They're using MySQL and a web frontend to distribute data throughout the world on a daily basis to other astronomers who need it. I'm always biased toward raw data myself; it is critical, but if you can't distribute it, it's not worth much.<br />
<br />
Good talk for me to attend.<br />
<br />
<a href="https://za.pycon.org/talks/5/" target="_blank">Albert Nel - Using Python in Blender</a> Nel is a total joker (in a respectful, entertaining, good way), but not enough of a joker to bely a serious love and enthusiasm for both Python and Blender.<br />
<br />
My own experience with rendering 3D stuff is a little dinking around with POV-ray. Blender is different in that it's big on animations and honoring the laws of physics. Writing Python to automate Blender is similar to, for lack of a better analogy, writing or recording VBA macros in Excel.<br />
<br />
Nel did a lotto ball live demo and a Lego movie ocean demo (aside: I *LOVE* live demos, even when they go wrong - it's one of the best parts of Open Source conferences versus say, a godawful boring company Powerpoint presentation - thank you to the Nelster for accomodating us).<br />
<br />
My takeaways:<br />
<br />
Blender is fun.<br />
<a href="https://za.pycon.org/talks/50/" target="_blank"><br /></a>
<a href="https://za.pycon.org/talks/50/" target="_blank">Allison Randal The Earth is not Flat (and Other Heresies) </a>Keynote - a lot of times I don't relate a lot to keynotes because it's about super high level programmer craft stuff (disclaimer: I've worked as a dev, but I'm a geologist by trade) that I can't really control or understand.<br />
<br />
So my mind wandered as Randal gracefully moved about the stage in her pixie frame and calmly laid down her knowledge. As I much younger man I would have been thinking, "She's so smart . . . and a very attractive individual to boot . . ." As a curmudgeony old fart my thoughts go more towards the "Damn - she's in perfect shape, speaks well, and knows what the hell she's talking about. I'm SOOO jealous; why can't I be like that?" In all seriousness, what always blows me away when I see Randal talk is the calm, matter of fact way she just presents facts and opinions without any malice or belligerence.<br />
<br />
At one point she responded to a question by saying essentially, "Don't use AWS; use OpenStack <if you want to accomplish X>." Amazon was one of the three top corporate sponsors of the event, but it wasn't a SPEAK TRUTH TO POWER/VIVE LA REVOLUCION kind of thing, just a "this is what I think based on what I know."<br />
<br />
I'm glad she's with "us" (the open source community) instead of selling her soul to the commercial world (which she could do at great profit). <br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSCQqKCJnm0UPQNaIJOnkVvQFUg9T-Yb-bFLN4HAd97hpTUVhyphenhyphenbnW5QzNNq9UF9eVfyAXI-p4YFhmekevFaxCE1kqqettboRD7OSf4ykQNEH9NeN7to3ZlZ2em_4hVce3CPBaH08sB4po/s1600/ninja.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSCQqKCJnm0UPQNaIJOnkVvQFUg9T-Yb-bFLN4HAd97hpTUVhyphenhyphenbnW5QzNNq9UF9eVfyAXI-p4YFhmekevFaxCE1kqqettboRD7OSf4ykQNEH9NeN7to3ZlZ2em_4hVce3CPBaH08sB4po/s1600/ninja.png" height="210" width="320" /></a></div>
Takeaway (tongue-in-cheek) - my view of me vis a vis Allison Randal (I'm the guy on the right).<br />
<br />
They say "kill your heroes." Until I drop 40 lbs. and learn to express my ideas in a less conflict ridden manner, I am not ready to kill anything. Sorry, Ms. Randal. I hope this isn't too creepy, but you're going to remain the queen on my hero pedestal for a while :-\<br />
<br />
<a href="https://za.pycon.org/talks/46/" target="_blank">Dr. David Mertz What I Learned About Python - and About Guido's Time Machine - from Reading the Python-Ideas Mailing List</a> Keynote - David took an example of an idea for a sum function for lists and walked through all the considerations of sanity, performance, implementation, and ultimate rejection.<br />
<br />
My takeaways:<br />
<br />
<ol>
<li>The idea has to be intuitive and make sense (he actually experimented with this socialogically - that was kind of cool).</li>
<li>The implementation has to be consistent.</li>
<li>Performance matters (a lot).</li>
<li style="text-align: left;">1 trumps 2 and 3.</li>
</ol>
<br />
<a href="https://za.pycon.org/schedule/" target="_blank">Adrianna Pińska An Introduction to Regular Expressions in Python</a> Don't let the name fool you; this Polish lady speaks the Queen's English quite well. She apologized (sort of) ahead of time saying she would talk too fast, but, really, the talk was paced just right. I was really happy having gone to it.<br />
<br />
My takeaways (for regex):<br />
<br />
<ol>
<li>Start with very general matches (.* for example) and work towards specific matches to gain skill and confidence.</li>
</ol>
<a href="https://za.pycon.org/talks/8/" target="_blank">Ridhwana Khan A Journey Through the Eyes of a Newbie Female Developer</a> Very positive, professional talk, especially for a youngster.<br />
<br />
(Aside: it's none of my business, but I think Ms. Khan is Muslim - she wore this really cool black-red combination outfit with a red head scarf - I borked my picture with my point and shoot camera, but I think a video of the talk is online. Anyway, for a diversity-oriented talk, the outfit was not only cool and classy, but perfect for a South African con).<br />
<br />
Ridhwana's talk was well structured with some humor interjected. She started out with the most important point - that she loves coding and wants to do this for a career. There were a number of valid points and ideas put forward - it's worth checking it out online.<br />
<br />
My main takeaway: IIRC not once did Ridhwana mention a Code of Conduct policy nor did she dwell on personal experiences with harassment. Essentially, she has had a pretty good experience with colleagues thus far. After a year with an all male crew (her excepted), she learned that prior to her arrival, firm rules had been established regarding off-color humor (basically banned) and such. For me, this is a pretty good example of how some firm (but not excessively draconian) rules can help make programmer-land a women friendly place. Ridhwana's point was that (at least in South African society) this is typically how relationships go anyway. You meet someone, then after some time you get to know them better, and at that time, you can loosen up a bit more as appropriate.<br />
<br />
Hallway track: there were fewer than 150 people at this con IIRC, so if you wanted to talk to anyone, there was time. People involved with the new kilometer array telescope project, people involved with the older telescopes northeast of Cape Town, speakers, Dr. Mertz, Allison Randal, a PhD in computational mathematics who specializes in computer vision, South African devs, the organizers of the conference - where else could a grunt open pit mine geologist like me have access to such luminosity? pycon.za is pretty sweet.<br />
<br />Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com0tag:blogger.com,1999:blog-524230429673765509.post-19132042262899964822014-09-01T09:59:00.000-07:002018-09-18T07:05:54.377-07:00PDF - Removing Pages and Inserting Nested BookmarksI blogged <a href="http://pyright.blogspot.com/2014/03/editing-pdf-file-with-python-with.html" target="_blank">before</a> about PyPDF2 and some initial work I had done in response to a request to get a report from Microsoft SQL Server Reporting Services into PDF format. Since then I've had better luck with PyPDF2 using it with Python 3.4. Seldom do I need to make any adjustments to either the PDF file or my Python code to get things to work.<br />
<br />
Presented below is the code that is working for me now. The basic gist of it is to strip the blank pages (conveniently SSRS dumps the report with a blank page every other page) from the SSRS PDF dump and reinsert the bookmarks in the right places in a new final document. The report I'm doing is about 30 pages, so having bookmarks is pretty critical for presentation and usability.<br />
<br />
The approach I took was to get the bookmarks out of the PDF object model and into a nested dictionary that I could understand and work with easily. To keep the bookmarks in the right order for presentation I used collections.OrderedDict instead of just a regular Python dictionary structure. The code should work for any depth level of nested parent-child PDF bookmarks. My report only goes three or four levels deep, but things can get fairly complex even at that level.<br />
<br />
There are a couple artifacts of the actual report I'm doing - the name "comparisonreader" refers to the subject of the report, a comparison of accounting methods' results. I've tried to sanitize the code where appropriate, but missed a thing or two.<br />
<br />
It may be a bit overwrought (too much code), but it gets the job done. Thanks for having a look.<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">#!C:\python34\python</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">"""<br />Strip out blank pages and keep bookmarks for<br />SQL Server SSRS dump of model comparison report (pdf).<br />"""</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">import PyPDF2 as pdf</span><span style="font-family: "courier new" , "courier" , monospace;"> </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">import math</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">from collections import OrderedDict</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">INPUTFILE = 'SSRSdump.pdf' <br />OUTPUTFILE = 'Finalreport.pdf'</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">OBJECTKEY = '/A'<br />LISTKEY = '/D'</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># Adobe PDF document element keys.<br />FULLPAGE = '/Fit'<br />PAGE = '/Page'<br />PAGES = '/Pages'<br />ROOT = '/Root'<br />KIDS = '/Kids'<br />TITLE = '/Title'</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"># Python/PDF library types.<br />NODE = pdf.generic.Destination<br />CHILD = list</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">ADDPAGE = 'Adding page {0:d} from SSRS dump to page {1:d} of new document . . .'</span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"># dictionary keys<br />NAME = 'name'<br />CHILDREN = 'children'</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">INDENT = 4 * ' '</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br />ADDEDBOOKMARK = 'Added bookmark {0:s} to parent bookmark {1:s} at depthlevel {2:d}.'</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">TOPLEVEL = 'TOPLEVEL'</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def getpages(comparisonreader):<br /> """<br /> From a PDF reader object, gets the <br /> page numbers of the odd numbered pages <br /> in the old document (SSRS dump) and<br /> the corresponding page in the final<br /> document.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Returns a generator of two tuples.<br /> """<br /> # get number of pages then get odd numbered pages<br /> # (even numbered indices)<br /> numpages = comparisonreader.getNumPages()<br /> return ((x, int(x/2)) for x in range(numpages) if x % 2 == 0)</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def fixbookmark(bookmark):<br /> """<br /> bookmark is a PyPDF2 bookmark object.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Side effect function that changes bookmark<br /> page display mode to full page.<br /> """<br /> # getObject yields a dictionary<br /> props = bookmark.getObject()[OBJECTKEY][LISTKEY][1] = pdf.generic.NameObject(FULLPAGE)<br /> return 0</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def matchpage(page, pages):<br /> """<br /> Find index of page match.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> page is a PyPDF2 page object.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> pages is the list (PyPDF2 array) of page objects.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Returns integer page index in new (smaller) doc.<br /> """<br /> originalpageidx = pages.index(page)<br /> return math.floor((originalpageidx + 1)/2)</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def pagedict(bookmark, pages):<br /> """<br /> Creates page dictionary for PyPDF2 bookmark object.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> bookmark is a PDF object (dictionary).</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> pages is a list of PDF page objects (dictionary).</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Returns two tuple of a dictionary and <br /> integer page number.<br /> """<br /> page = matchpage(bookmark[PAGE].getObject(), pages)<br /> title = bookmark[TITLE]<br /> # One bookmark per page per level.<br /> lookupdict = OrderedDict()<br /> lookupdict.update({page:{NAME:title,<br /> CHILDREN:OrderedDict()}})<br /> return lookupdict, page</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def recursivepopulater(bookmark, pages):<br /> """<br /> Fills in child nodes of bookmarks<br /> recursively and returns dictionary.<br /> """<br /> dictx = OrderedDict()<br /> for pagex in bookmark:<br /> if type(pagex) is NODE:<br /> # get page info and update dictionary with it<br /> lookupdict, page = pagedict(pagex, pages)<br /> dictx.update(lookupdict)<br /> elif type(bookmark) is CHILD:<br /> newdict = OrderedDict()<br /> newdict.update(recursivepopulater(pagex, pages))<br /> dictx[page][CHILDREN].update(newdict)<br /> return dictx</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def makenewbookmarks(pages, bookmarks):<br /> """<br /> Main function to generate bookmark dictionary:</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> {page number: {name:<name>,<br /> children:[<more bookmarks>]},<br /> and so on.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Returns dictionary.<br /> """<br /> dictx = OrderedDict()<br /> # top level bookmarks<br /> # it's going to go bookmark, list, bookmark, list, etc.<br /> for bookmark in bookmarks:<br /> if type(bookmark) is NODE:<br /> # get page info and update dictionary with it<br /> lookupdict, page = pagedict(bookmark, pages)<br /> dictx.update(lookupdict)<br /> elif type(bookmark) is CHILD:<br /> dictx[page][CHILDREN] = recursivepopulater(bookmark, pages)<br /> return dictx</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def printbookmarkaddition(name, parentname, depthlevel):<br /> """<br /> Print notification of bookmark addition.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Indentation based on integer depthlevel.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> name is the string name of the bookmark.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> parentname is the string name of the parent<br /> bookmark.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Side effect function.<br /> """<br /> args = name, parentname, depthlevel<br /> indent = depthlevel * INDENT<br /> print(indent + ADDEDBOOKMARK.format(*args))</span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">def dealwithbookmarks(comparisonreader, output, bookmarkdict, depthlevel, levelparent=None, parentname=None):<br /> """<br /> Fix bookmarks so that they are properly<br /> placed in the new document with the blank<br /> pages removed. Recursive side effect function.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> comparisonreader is the PDF reader object<br /> for the original document.</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> output is the PDF writer object for the<br /> final document.</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> bookmarkdict is a dictionary of bookmarks.</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> depthlevel is the depth inside the nested<br /> dictionary-list structure (0 is the top).</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> levelparent is the parent bookmark.</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> parentname is the name of the parent bookmark.<br /> """<br /> depthlevel += 1<br /> for pagekeylevel in bookmarkdict:<br /> namelevel = bookmarkdict[pagekeylevel][NAME]<br /> levelparentii = output.addBookmark(namelevel, pagekeylevel, levelparent)<br /> if depthlevel == 0:<br /> parentname = TOPLEVEL<br /> printbookmarkaddition(namelevel, parentname, depthlevel)<br /> fixbookmark(levelparentii)<br /> # dictionary<br /> secondlevel = bookmarkdict[pagekeylevel][CHILDREN]<br /> argsx = comparisonreader, output, secondlevel, depthlevel, levelparentii, namelevel<br /> # Recursive call.<br /> dealwithbookmarks(*argsx)</span><br />
<span style="font-family: "courier new";"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">def cullpages():<br /> """<br /> Fix SSRS PDF dump by removing blank<br /> pages.<br /> """<br /> ssrsdump = open(INPUTFILE, 'rb')<br /> finalreport = open(OUTPUTFILE, 'wb')<br /> comparisonreader = pdf.PdfFileReader(ssrsdump)<br /> pageindices = getpages(comparisonreader)<br /> output = pdf.PdfFileWriter()<br /> # add pages from SSRS dump to new pdf doc<br /> for (old, new) in pageindices:<br /> print(ADDPAGE.format(old, new))<br /> pagex = comparisonreader.getPage(old)<br /> output.addPage(pagex)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> # Attempt to add bookmarks from original doc<br /> # getOutlines yields a list of nested dictionaries and lists:<br /> # outermost list - starts with parent bookmark (dictionary)<br /> # inner list - starts with child bookmark (dictionary) <br /> # and so on<br /> # The SSRS dump and this list have bookmarks in correct order.<br /> bookmarks = comparisonreader.getOutlines()<br /> # Get page numbers using this methodology (indirect object references)<br /> # </span><a href="http://stackoverflow.com/questions/1918420/split-a-pdf-based-on-outline"><span style="font-family: "courier new" , "courier" , monospace;">http://stackoverflow.com/questions/1918420/split-a-pdf-based-on-outline</span></a><br />
<span style="font-family: "courier new" , "courier" , monospace;"> # list of IndirectObject's of pages in order<br /> pages = [pagen.getObject() for pagen in<br /> comparisonreader.trailer[ROOT].getObject()[PAGES].getObject()[KIDS]]<br /> # Bookmarks.<br /> # Top level is list of bookmarks.<br /> # List goes parent bookmark (Destination object)<br /> # child bookmarks (list)<br /> # and so on.<br /> bookmarkdict = makenewbookmarks(pages, bookmarks)<br /> # Initial level of -1 allows increment to 0 at start.<br /> dealwithbookmarks(comparisonreader, output, bookmarkdict, -1)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> print('\n\nWriting final report . . .')<br /> output.write(finalreport)<br /> finalreport.close()<br /> ssrsdump.close()<br /> print('\n\nFinished.\n\n')</span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">if __name__ == '__main__':<br /> cullpages()</span>Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com3tag:blogger.com,1999:blog-524230429673765509.post-39626240276177336152014-08-31T20:36:00.001-07:002014-08-31T20:36:46.700-07:00Internet Explorer 9 Save Dialog - SendKeys Last ResortAt work we use Internet Explorer 9 on Windows 7 Enterprise. SharePoint is the favored software for filesharing inside organizational groups. Our mine planning office is in the States; the mine operation whose data I work is in a remote, poorly connected location of the world.<br />
<br />
Recently Sharepoint was updated to a new version at the mine. The SharePoint server configuration there no longer allows Windows Explorer view or mapping of the site to a Windows drive letter. I've put in a trouble ticket to regain this functionality, but that may take a while if it's possible. Without it it is difficult to automate file retrieval or get more than one file at a time.<br />
<br />
In the meantime I've been able to get the text based files over using win32com automation in Python to run Internet Explorer and grab the innerHTML object. innerHTML is essentially the text of the files with tags around it. I rip out the tags, write the text to a file on my harddrive and I'm good to go.<br />
<br />
Binary files proved to be more difficult to download. Shown below is a screenshot of the Internet Explorer 9 dialog box that goes by the generic name <a href="http://windows.microsoft.com/en-us/windows7/internet-explorer-9-keyboard-shortcuts" target="_blank">Notification Bar</a>:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyKilzDP80em2ohJb0mcR7zxwrrpvVYZk8XwlO40-lg-9UR_jfkzPH5KZyvKKv5YWv4yRrEbI7olR5eo6tkEaz8A6C_JpecbvY76aNGBAdUYsUgG9giLpDiYQ5KURq2JYuizrxsy0tPeo/s1600/savedialog.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyKilzDP80em2ohJb0mcR7zxwrrpvVYZk8XwlO40-lg-9UR_jfkzPH5KZyvKKv5YWv4yRrEbI7olR5eo6tkEaz8A6C_JpecbvY76aNGBAdUYsUgG9giLpDiYQ5KURq2JYuizrxsy0tPeo/s1600/savedialog.PNG" height="16" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: left;">
I googled and could nowhere find how this thing fit into the Internet Explorer 9 Document object hierarchy. Then I came upon this <a href="http://social.msdn.microsoft.com/Forums/ie/en-US/5f689e19-5676-4d5d-9e37-4d44b7c7da0c/ie9-run-save-as-cancel-dialog-box-at-bottom-need-asap-pls-bypassing-user-interaction-for?forum=iewebdevelopment" target="_blank">colorful exchange between Microsoft Certified MVP's from 2012</a> that made things a little more clear.</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
It turns out you can't access the Notification Bar programatically per se. What you can do is activate the specific Internet Explorer window and tab you're interested in, then send keystrokes to get where you want to, click, and download your file.</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
I'm not a web programmer nor am I a dedicated Windows programmer (I'm actually a geologist). <a href="http://www.mayukhbose.com/python/IEC/" target="_blank">IEC</a> is a small module that wraps some useful functionality - in my case identifying and clicking on the link on the SharePoint page by it's text identifier:</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;"># C Python 2.7</span></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;"># Internet Explorer module.<br />import IEC as iec</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;"></span> </div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;">import time</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;"></span> </div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;">ie = iec.IEController()</span></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;">ie.Navigate(<URL of SharePoint page>)</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;"># Give the page time to load (7 seconds).</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;">time.sleep(7)</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;"># I want to download file 11.msr.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;">ie.ClickLink('11')</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;"># Give 5 seconds for the Notification Bar to show up.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;">time.sleep(5)</span></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
I'm fortunate in that our mine planning vendor, <a href="http://www.minesight.com/" target="_blank">MineSight,</a> ships Python 2.7 and associated win32com packages along with their software (their API's are written for Python). If you don't have win32com and friends installed, they are necessary for this solution.</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
At this point I've just got to deal with that pesky Internet Explorer 9 Notification Bar. As it turns out, SendKeys makes it doable (although neither elegant nor robust :-( ):</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;"># Activate the SharePoint page.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;">from win32com.client import Dispatch as dispx</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;">shell = dispx('WScript.Shell')</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;">shell.AppActivate(<name of IE9 tab>)</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;"># Little pause.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;">time.sleep(0.5)</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;"># Keyboard combination for the Notification Bar selection</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;"># is ALT-N or '%n'</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;">shell.SendKeys('%n', True)</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;"># The Notification Bar goes to "Open" by default.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;"># You need to tab over to the "Save" button.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;">shell.SendKeys('{TAB}')</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;"># Another little pause.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;">time.sleep(0.1)</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;"># Space bar clicks on this control.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "Courier New", Courier, monospace;">shell.SendKeys(' ', True)</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Courier New;"></span> </div>
<div class="separator" style="clear: both; text-align: left;">
The key combinations for accessing the Notification Bar are in Microsoft's documentation <a href="http://windows.microsoft.com/en-us/windows7/internet-explorer-9-keyboard-shortcuts" target="_blank">here</a>. </div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
One link showing use of SendKeys is a German site (mostly English text) <a href="http://win32com.goermezer.de/content/view/136/284/" target="_blank">here</a>.</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
And that's pretty much it. There's another dialog that pops up in Internet Explorer 9 after the file is downloaded. I've been able to blow that off so far and it hasn't gotten in the way as I move to the next download. I give these files (about 300 kb) 15 seconds to download over a slow connection. I may have to adjust that.</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
This solution is an abomination by any coding/architecture/durability standard. Still, it's the abomination that is getting the job done for the time being.</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
Thanks for stopping by.</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com8tag:blogger.com,1999:blog-524230429673765509.post-13678048259956601132014-03-28T14:05:00.000-07:002014-03-28T14:48:06.459-07:00Editing a PDF file with Python (with a little help from PDFTKBuilder)I'm working with a report published with SQL Server Reporting Services (SSRS). The report is located on a remote server in Africa. It is inconvenient for management in North America to view the report and print it from a browser (slow connection, formatting issues). Instead, management would like a PDF file of the report to be e-mailed out to a distribution list.<br />
<br />
This post deals with taking the PDF dumped from the SSRS web report and cleaning it up for viewing and navigation (bookmarking). I didn't know a great deal about PDF's before working on this. My ignorance will probably be reflected in the terminology I use and my approach. Nonetheless, the problem was a bit more involved than I anticipated. My intent is to put my experience out there and, if I have made things harder than necessary, get some feedback in the comments.<br />
<br />
I think it's fair to say that SSRS is not a mature product yet, but, in a Microsoft/Windows environment its usefulness trumps that. The dump to PDF or Excel feature for reports is handy, but doesn't always yield an output format consistent with the SSRS web report. The first problem I had was a "corrupt" PDF dump. The file opens fine in Acrobat Reader, but doesn't behave well when one attempts to copy its contents with modifications to another file with <a href="https://pypi.python.org/pypi/PyPDF2/1.20">PyPDF2</a> (this is just a straight copy of pages from one pdf file to another new one):<br />
<br />
<span style="font-family: "Courier New", Courier, monospace; font-size: large;"><strong>Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win32<br />Type "help", "copyright", "credits" or "license" for more information.<br />>>> import PyPDF2 as pdf<br />>>> dumpfile = open('baddumpfromssrs.pdf', 'rb')<br />>>> reader = pdf.PdfFileReader(dumpfile)<br />>>> numpages = reader.getNumPages()<br />>>> numpages<br />54<br />>>> outputfile = open('testoutput.pdf', 'wb')<br />>>> writer = pdf.PdfFileWriter()<br />>>> for x in xrange(numpages):<br />... writer.addPage(reader.getPage(x))<br />...<br />>>> writer.write(outputfile)<br />Traceback (most recent call last):<br /> File "<stdin>", line 1, in <module><br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 279, in write<br /> self._sweepIndirectReferences(externalReferenceMap, self._root)<br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 367, in _sweepIndirectReferences<br /> self._sweepIndirectReferences(externMap, realdata)<br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 343, in _sweepIndirectReferences<br /> value = self._sweepIndirectReferences(externMap, value)<br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 367, in _sweepIndirectReferences<br /> self._sweepIndirectReferences(externMap, realdata)<br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 343, in _sweepIndirectReferences<br /> value = self._sweepIndirectReferences(externMap, value)<br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 352, in _sweepIndirectReferences<br /> value = self._sweepIndirectReferences(externMap, data[i])<br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 367, in _sweepIndirectReferences<br /> self._sweepIndirectReferences(externMap, realdata)<br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 343, in _sweepIndirectReferences<br /> value = self._sweepIndirectReferences(externMap, value)<br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 343, in _sweepIndirectReferences<br /> value = self._sweepIndirectReferences(externMap, value)<br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 343, in _sweepIndirectReferences<br /> value = self._sweepIndirectReferences(externMap, value)<br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 381, in _sweepIndirectReferences<br /> newobj = self._sweepIndirectReferences(externMap, newobj)<br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 343, in _sweepIndirectReferences<br /> value = self._sweepIndirectReferences(externMap, value)<br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 372, in _sweepIndirectReferences<br /> newobj = data.pdf.getObject(data)<br /> File "c:\python27\lib\site-packages\PyPDF2\pdf.py", line 1164, in getObject<br /> retval = readObject(self.stream, self)<br /> File "c:\python27\lib\site-packages\PyPDF2\generic.py", line 71, in readObject<br /> return DictionaryObject.readFromStream(stream, pdf)<br /> File "c:\python27\lib\site-packages\PyPDF2\generic.py", line 587, in readFromStream<br /> value = readObject(stream, pdf)<br /> File "c:\python27\lib\site-packages\PyPDF2\generic.py", line 91, in readObject<br /> return NumberObject.readFromStream(stream)<br /> File "c:\python27\lib\site-packages\PyPDF2\generic.py", line 257, in readFromStream<br /> return NumberObject(num)<br />ValueError: invalid literal for int() with base 10: ''<br />>>></strong></span><br />
<strong><span style="font-family: Courier New; font-size: large;"></span></strong><br />
<span style="font-family: inherit;">Bummer. I had to google around until something showed up on <a href="http://stackoverflow.com/questions/6393800/split-pdf-files-in-python-valueerror-invalid-literal-for-int-with-base-10">Stackoverflow</a>. There's a comment on the post that suggests the use of pdftk to "un-corrupt" the file. To avoid having to have an admin install something on my work computer, I downloaded <a href="http://portableapps.com/apps/office/pdftk_builder_portable">PDFTKBuilder Portable</a>. This is really overkill, because I didn't need the user interface to clean up the file. There is an App folder in the pdftk portable install that has the pdftk command line tool:</span><br />
<span style="font-size: x-small;"></span><br />
<strong><span style="font-family: "Courier New", Courier, monospace; font-size: large;">C:\blogdoc>pdftk baddumpfromssrs.pdf output good.pdf</span></strong><br />
<strong><span style="font-family: Courier New; font-size: large;"></span></strong><br />
<span style="font-family: inherit;">This "worked" in terms of preparing the file to be dealt with with Python and PyPDF2, but not before I opened it in Adobe Reader and closed it. I'm working in a corporate environment under Windows 7. I double checked to see that I had the same command in the command window (I used the up arrow key to recall it to issue the command that worked). I don't know what's going on there. The important thing was that I could proceed with the non-corrupt file good.pdf.</span><br />
<br />
While I'm on things that weren't working, I should probably mention the Python 3/Python 2 thing. This originally gave an error on Python 3; when I tried to reproduce the problem, it hung forever and I had to kill it with Ctrl-C:<br />
<span style="font-family: inherit;"></span><br />
<span style="font-family: "Courier New", Courier, monospace; font-size: large;"><strong>Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:24:06) [MSC v.1600 32 bit (Intel)] on win32<br />Type "help", "copyright", "credits" or "license" for more information.<br />>>> import PyPDF2 as pdf<br />>>> inputfile = open('good.pdf', 'rb')<br />>>> reader = pdf.PdfFileReader(inputfile)<br />>>> pagex = reader.getPage(0)<br />>>> pagex.extractText()<br />Traceback (most recent call last):<br /> File "<stdin>", line 1, in <module><br /> File "c:\python34\lib\site-packages\PyPDF2\pdf.py", line 2070, in extractText<br /> content = ContentStream(content, self.pdf)<br /> File "c:\python34\lib\site-packages\PyPDF2\pdf.py", line 2153, in __init__<br /> self.__parseContentStream(stream)<br /> File "c:\python34\lib\site-packages\PyPDF2\pdf.py", line 2173, in __parseContentStream<br /> operator += tok<br />KeyboardInterrupt</strong></span><br />
<strong><span style="font-family: Courier New; font-size: large;"></span></strong><br />
<span style="font-family: inherit;">I'm a bit of a Python 3 advocate, sometimes even a zealot. Still, pain won over my conviction and I switched to Python 2.7 where I got better results. There is mention of this problem (with error) on <a href="http://stackoverflow.com/questions/19179043/pypdf2-typeerror-when-trying-to-run-example-from-lib">StackOverflow</a>. A comment makes mention of replacing a couple PyPDF2 source files to make sure it runs with Python 3. I couldn't find the link and took the expedient Python 2.7 route.</span><br />
<br />
This is about where everything started to work the way it was supposed to. Now I could get down to fixing the SSRS pdf report dump. The first thing that needed to happen was the removal of a bunch of blank pages from the report. On the SSRS web report they weren't there, but the PDF file had a blank page everywhere there was a page break. Conveniently, this was every other page:<br />
<br />
<span style="font-family: "Courier New", Courier, monospace; font-size: large;"><strong>Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win32<br />Type "help", "copyright", "credits" or "license" for more information.<br />>>> import PyPDF2 as pdf<br />>>> inputfile = open('good.pdf', 'rb')<br />>>> reader = pdf.PdfFileReader(inputfile)<br />>>> numpages = reader.getNumPages()<br />>>> numpages<br />54<br />>>> contentpages = (x for x in xrange(numpages) if x % 2 == 0)<br />>>> writer = pdf.PdfFileWriter()<br />>>> for n in contentpages:<br />... pagex = reader.getPage(n)<br />... writer.addPage(pagex)<br />...<br />>>> writer.write(outputfile)<br />>>> outputfile.close()<br />>>> inputfile.close()</strong></span><br />
<strong><span style="font-family: Courier New; font-size: large;"></span></strong><br />
<span style="font-family: inherit;">Super! Now I've got a 27 page document with content on every page. The next thing I needed in my case was a banner or mark across each page saying, "DRAFT FORMAT." The specific idea was that this was a sample report being circulated for comments and approval.</span><br />
<br />
I didn't want super bold red text across the page, rather white text outlined in red. Some googling paid off with a <a href="http://two.pairlist.net/pipermail/reportlab-users/2007-November/006484.html">suggestion</a> from a mailing list. <a href="https://pypi.python.org/pypi/reportlab">reportlab</a>.pdfgen is the tool used for creating the file with the banner. We'll merge it to the pages of the main report document later.<br />
<br />
<strong><span style="font-family: "Courier New", Courier, monospace; font-size: large;">Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win32<br />Type "help", "copyright", "credits" or "license" for more information.<br />>>> from reportlab.pdfgen import canvas as canx<br />>>> c = canx.Canvas('banner.pdf')<br />>>> # want red outline<br />...<br />>>> c.setStrokeColor((1, 0, 0))<br />>>> # inside of letters should be white<br />...<br />>>> c.setFillColor((1, 1, 1))<br />>>> c.setLineWidth(1.0)<br />>>> t = c.beginText()<br />>>> t.setTextRenderMode(2)<br />>>> c._code.append(t.getCode())<br />>>> c.setFont('Helvetica', 48)<br />>>> # origin is at bottom, left of page<br />...<br />>>> c.drawString(2 * 72, 7 * 72, 'DRAFT FORMAT')<br />>>> c.save()<br />>>><br />>>></span></strong><br />
<strong><span style="font-family: Courier New; font-size: large;"></span></strong><br />
<span style="font-family: inherit;">Great, I've got a banner.</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDkCDC2clTNeQ35qwsoiBf0pv2ImhNiYqL9jFDXwdjm0K1JafWW-wh5vdtMuJo45DXhpykjR3aiCeig-BS-3DODpR_pE5I5Vig9a9MioYgXDb6lgQCZzwKPOnr3rAEEe_rsewzOs1Sk-w/s1600/banner.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDkCDC2clTNeQ35qwsoiBf0pv2ImhNiYqL9jFDXwdjm0K1JafWW-wh5vdtMuJo45DXhpykjR3aiCeig-BS-3DODpR_pE5I5Vig9a9MioYgXDb6lgQCZzwKPOnr3rAEEe_rsewzOs1Sk-w/s1600/banner.PNG" height="320" width="219" /></a></div>
<br />
<span style="font-family: inherit;">There is a whole bunch of stuff in that code segment that I'm leaving unexplained. Not a big surprise, but to use a Python API to edit PDF's, you need to know something about the format. This has been a huge learning experience over the course of a day or two for me. What helped me most is the reportlab <a href="http://python.net/~gherman/tmp/rl118api.pdf">documentation</a>. After copying a code snippet and seeing that it worked, I could go back there and try to figure out how it works. This learning experience is a work in progress. There are things you pick up right away, though. For instance, Adobe Reader comes with 14 base <a href="http://forums.adobe.com/thread/1109176">fonts</a> of which Hevletica is one. Who knew? Not I!</span><br />
<br />
My banner isn't quite the way I want it. It's horizontal and I would like to tilt it to 45 degrees. Google again to the rescue. Some kind soul has already covered it on a <a href="http://wa5pb.freeshell.org/motd/?p=769">blog</a>.<br />
<br />
<strong><span style="font-family: "Courier New", Courier, monospace; font-size: large;">Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win32<br />Type "help", "copyright", "credits" or "license" for more information.<br />>>> from reportlab.pdfgen import canvas as canx<br />>>> c = canx.Canvas('banner.pdf')<br />>>> c.setStrokeColor((1, 0, 0))<br />>>> c.setFillColor((1, 1, 1))<br />>>> c.setLineWidth(1.0)<br />>>> t = c.beginText()<br />>>> t.setTextRenderMode(2)<br />>>> c._code.append(t.getCode())<br />>>> c.setFont('Helvetica', 48)<br />>>> c.saveState()<br />>>> c.translate(100, 100)<br />>>> c.rotate(45)<br />>>> c.drawCentredString(500, 100, 'DRAFT FORMAT')<br />>>> c.save()<br />>>></span></strong><br />
<br />
<span style="font-family: inherit;">Close enough. Confession - I don't really think things through and measure with trigonometry what it will take to get placement right; I just "hack" until it looks about right. This is a habit I should break if I continue to have to play with pdf's.</span><br />
<br />
<br />
<br />
<br />
<span style="font-size: x-small;"></span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXlNMj2BQE5E9LebpBcMHInoiWB5qa2WFPynNHtWQrAcEDTWdRtHDJqQFAuB8nMcKYhdoK-gEzc8nljAzL3klLY_S427qYEzFdZXhMMsEavq_JkVVHxJXrkcXde1JbsWWyS9dT4L1-fj0/s1600/banner2.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXlNMj2BQE5E9LebpBcMHInoiWB5qa2WFPynNHtWQrAcEDTWdRtHDJqQFAuB8nMcKYhdoK-gEzc8nljAzL3klLY_S427qYEzFdZXhMMsEavq_JkVVHxJXrkcXde1JbsWWyS9dT4L1-fj0/s1600/banner2.PNG" height="320" width="218" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
Now we'll merge the banner with some pages from another pdf to make a new document. I'm going to use pages from the reportlab documentation because there's all kinds of work stuff in the pdf I generated above.<br />
<span style="font-size: x-small;"></span><br />
<span style="font-family: "Courier New", Courier, monospace; font-size: large;"><strong>Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win32<br />Type "help", "copyright", "credits" or "license" for more information.<br />>>> import PyPDF2 as pdf<br />>>> bannerfile = open('banner.pdf', 'rb')<br />>>> docfile = open('docfile.pdf', 'rb')<br />>>> outputfile = open('newfile.pdf', 'wb')<br />>>> readerbanner = pdf.PdfFileReader(bannerfile)<br />>>> readerdoc = pdf.PdfFileReader(docfile)<br />>>> writernewdoc = pdf.PdfFileWriter()<br />>>> pagesdoc = (readerdoc.getPage(x) for x in xrange(286, 291))<br />>>> for pagen in pagesdoc:<br />... writernewdoc.addPage(pagen)<br />...<br />>>> writernewdoc.write(outputfile)<br />>>> outputfile.close()<br />>>> docfile.close()<br />>>> # now merge banner to pages of new file<br />...<br />>>> opaquebannerfile = open('opaquebannerfile.pdf', 'wb')<br />>>> testpagefile = open('newfile.pdf', 'rb')<br />>>> bannerpage = readerbanner.getPage(0)<br />>>> readertestpages = pdf.PdfFileReader(testpagefile)<br />>>> writeropaquebanner = pdf.PdfFileWriter()<br />>>> for x in xrange(readertestpages.getNumPages()):<br />... pagex = readertestpages.getPage(x)<br />... pagex.mergePage(bannerpage)<br />... writeropaquebanner.addPage(pagex)<br />...<br />>>> writeropaquebanner.write(opaquebannerfile)<br />>>> opaquebannerfile.close()<br />>>> bannerfile.close()<br />>>> testpagefile.close()<br />>>></strong></span><br />
<strong><span style="font-family: Courier New; font-size: large;"></span></strong><br />
<span style="font-family: inherit;">It's not perfect, but it's essentially what I wanted (my centering of the banner could be better).</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3NN2lr-v0Vi0sX584zCn1xPz5oSePrYuDKCDn2EzG8vV3W2cXuAU0wuK2gm-13tcVooXwZR11YbjRqu_53VoRiiHW5AK-wrJAsRGmeY6rPjyvYuziaIGz6Y8L37u9x3bWrEwnlz21beI/s1600/opaque.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3NN2lr-v0Vi0sX584zCn1xPz5oSePrYuDKCDn2EzG8vV3W2cXuAU0wuK2gm-13tcVooXwZR11YbjRqu_53VoRiiHW5AK-wrJAsRGmeY6rPjyvYuziaIGz6Y8L37u9x3bWrEwnlz21beI/s1600/opaque.PNG" height="190" width="320" /></a></div>
<br />
<br />
What if I wanted a transparent banner to emphasize the draft nature of the content rather than that of the format?<br />
<br />
<span style="font-family: "Courier New", Courier, monospace; font-size: large;"><strong>Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win32<br />Type "help", "copyright", "credits" or "license" for more information.<br />>>> from reportlab.pdfgen import canvas as canx<br />>>> c = canx.Canvas('transparent.pdf')<br />>>> c.setStrokeColor((1, 0, 0))<br />>>> transparentwhite = canx.Color(255, 255, 255, alpha = 0.0)<br />>>> c.setFillColor(transparentwhite)<br />>>> t = c.beginText()<br />>>> t.setTextRenderMode(2)<br />>>> c._code.append(t.getCode())<br />>>> c.setFont('Helvetica', 48)<br />>>> c.saveState()<br />>>> c.translate(100, 100)<br />>>> c.rotate(45)<br />>>> c.drawCentredString(500, 100, 'DRAFT')<br />>>> c.save()<br />>>><br />>>> # merge again<br />...<br />>>> transparentbannerfile = open('transparent.pdf', 'rb')<br />>>> testpagefile = open('newfile.pdf', 'rb')<br />>>> outputfile = open('mergedtransparent.pdf', 'wb')<br />>>> import PyPDF2 as pdf<br />>>> readerbanner = pdf.PdfFileReader(transparentbannerfile)<br />>>> readertestpages = pdf.PdfFileReader(testpagefile)<br />>>> bannerpage = readerbanner.getPage(0)<br />>>> writeroutput = pdf.PdfFileWriter()<br />>>> for x in xrange(readertestpages.getNumPages()):<br />... pagex = readertestpages.getPage(x)<br />... pagex.mergePage(bannerpage)<br />... writeroutput.addPage(pagex)<br />...<br />>>> writeroutput.write(outputfile)<br />>>> outputfile.close()<br />>>> transparentbannerfile.close()<br />>>> testpagefile.close()<br />>>></strong></span><br />
<strong><span style="font-family: Courier New; font-size: large;"></span></strong><br />
<span style="font-family: inherit;">Not beautiful (I would make the banner font edge thinner), but it is indeed transparent.</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKcxQKCOx4DaM9M2YskEIzOky8ISdMVJpvcIP_AohbAPvtKGMTF8LfFd8327Ve2cLsyFLnDbJfWbcNSu_fEi5_UjnyUid-ty2jFbWJYBVHvmUmH33CfsIpPS_dwh1zpkTw8KGviHaNlfY/s1600/transparent.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKcxQKCOx4DaM9M2YskEIzOky8ISdMVJpvcIP_AohbAPvtKGMTF8LfFd8327Ve2cLsyFLnDbJfWbcNSu_fEi5_UjnyUid-ty2jFbWJYBVHvmUmH33CfsIpPS_dwh1zpkTw8KGviHaNlfY/s1600/transparent.PNG" height="165" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div align="left" class="separator" style="clear: both; text-align: center;">
</div>
The transparency part is the alpha value in the code for the color transparentwhite. There is some sample <a href="http://www.reportlab.com/snippets/9/">code</a> that shows how to do this on reportlab.com's site.<br />
<br />
The last thing I needed to deal with was bookmarks. I had some problems initially in that, although the bookmark showed up, it ended up at the bottom of the page underneath the SSRS tables and charts I was trying to reference. I got around this by digging into the dictionary structure of the PyPDF2 Bookmark object. Here is the (one line) function code:<br />
<br />
<span style="font-family: "Courier New", Courier, monospace; font-size: large;"><strong>OBJECTKEY = '/A'<br />LISTKEY = '/D'</strong></span><br />
<strong><span style="font-family: Courier New; font-size: large;"></span></strong><br />
<span style="font-family: "Courier New", Courier, monospace; font-size: large;"><strong>FULLPAGE = '/Fit'</strong></span><br />
<strong><span style="font-family: Courier New; font-size: large;"></span></strong><br />
<strong><span style="font-family: Courier New; font-size: large;">def fixbookmark(boo</span></strong><strong><span style="font-family: Courier New; font-size: large;">kmark):<br /> """<br /> bookmark is a PyPDF2 bookmark object.</span></strong><br />
<strong><span style="font-family: Courier New; font-size: large;"> Side effect function that changes bookmark<br /> page display mode to full page.<br /> """<br /> # getObject yields a dictionary<br /> props = bookmark.getObject()[OBJECTKEY][LISTKEY][1] = pdf.generic.NameObject(FULLPAGE)<br /> return 0</span></strong><br />
<strong><span style="font-family: Courier New; font-size: large;"></span></strong><br />
addBookmark is a method of the PyPDF2.PdfFileWriter object. It takes a string name, a page index (zero based), and an optional parent PyPDF2 Bookmark object. The references in my fixbookmark function "take" prior to writing the pdf to disk with the write method of the PyPDF2.PdfFileWriter object.<br />
<br />
Mike Driscoll <a href="http://www.blog.pythonlibrary.org/2012/07/11/pypdf2-the-new-fork-of-pypdf/">blogged</a> about PyPDF2 a couple years back. He's got a <a href="http://www.blog.pythonlibrary.org/tag/python-pdf-series/">whole series</a> on PDF's in fact (aside: the man is a pragmatic programming blogging machine). There are good code snippets and pretty good comment threads on those posts for newbs like me. I found that rl118.pdf doc useful for familiarizing myself with the pdf file format and constants used to reference objects within the file format.<br />
<br />
This was a bit of an experience dump on my part. If you've read this far, thank you for your patience and for having a look.<br />
<br />
<br />
<br />Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com1tag:blogger.com,1999:blog-524230429673765509.post-43175242391396316892014-03-18T08:36:00.000-07:002014-03-18T08:36:09.259-07:00(Windows) LogParser - Install Without Admin RightsA twitter acquaintaince <a href="https://twitter.com/zippy1981">@zippy1981</a> recommended the Window's software <a href="http://www.microsoft.com/en-us/download/details.aspx?id=24659">LogParser</a> as a replacement for MSSQL bcp for my data transfer needs. I downloaded the msi file from Microsoft and tried to install it. As is true with a lot of software at work, I got a message saying the software can't be installed without admin rights.<br />
<br />
I tweeted @zippy1981 (actually Justin Dearing in "real life") back saying I couldn't install. He suggested using 7zip to decompress the msi file. I downloaded <a href="http://portableapps.com/apps/utilities/7-zip_portable">7zip portable</a> and followed the instructions and ended up with files with names like these:<br />
<br />
LogParser_dll.B1735C0B_1CB5_4257_8281_92109AE41CE6<br />
<br />
The names are not handy for the executable, nor will they work, but they are easy enough to decipher - there's an underscore between the extension and a period following the filename with a long string of characters.<br />
<br />
Here is the mini, somewhat clunky script I wrote for "fixing" the filenames (I used Python 3.3):<br />
<br />
<br />
<span style="font-family: "Courier New", Courier, monospace; font-size: large;"><strong>"""<br />Remove extensions from extracted<br />msi files.<br />"""</strong></span><br />
<span style="font-family: Courier New; font-size: large;"><strong></strong></span><br />
<span style="font-family: "Courier New", Courier, monospace; font-size: large;"><strong>DIRX = 'C://UserPrograms//LogParserWorking//'</strong></span><br />
<span style="font-family: Courier New; font-size: large;"><strong></strong></span><br />
<span style="font-family: "Courier New", Courier, monospace; font-size: large;"><strong>import os<br />import shutil</strong></span><br />
<span style="font-family: "Courier New", Courier, monospace; font-size: large;"><strong>filenames = []</strong></span><br />
<span style="font-family: Courier New; font-size: large;"><strong></strong></span><br />
<span style="font-family: "Courier New", Courier, monospace; font-size: large;"><strong>x = os.walk(DIRX)<br /># generator<br />for y in x:<br /> # lists of files<br /> for filex in y[2]:<br /> filenames.append(filex)</strong></span><br />
<span style="font-family: Courier New; font-size: large;"><strong></strong></span><br />
<span style="font-family: "Courier New", Courier, monospace; font-size: large;"><strong>for filex in filenames:<br /> # rip off end<br /> # change _ to .<br /> print(filex)<br /> # reverse<br /> filey = filex[-1::-1]<br /> # strip<br /> dotx = filey.find('.')<br /> filey = filey[dotx + 1:]<br /> # replace underscore<br /> underscore = filey.find('_')<br /> firstpart = filey[:underscore]<br /> firstpart += '.'<br /> secondpart = filey[underscore + 1:]<br /> filey = firstpart + secondpart<br /> filey = filey[-1::-1]<br /> print(filey)<br /> shutil.move(DIRX + filex, DIRX + filey)</strong></span><br />
<span style="font-family: Courier New; font-size: large;"></span><br />
<span style="font-family: inherit;">And voilá - I've got LogParser without having to bother our IT people for an install.</span><br />
<span style="font-family: inherit;"> </span><br />
<span style="font-family: inherit;">I'm probably late to the party on this msi extraction concept. Still, I thought there might be other people who are as unaware of it as I was, so I'm blogging it. Thanks for having a look.</span><br />Carl Trachtehttp://www.blogger.com/profile/12363048245012413049noreply@blogger.com0