Saturday, October 17, 2015

Storing and Displaying Images in Postgresql

Last post I set up a toy (literally) Postgresql database for my model train car collection.  A big part of the utility of the database is its ability to store images (pictures or photos) of the real life prototype and model train cars.  Postgresql (based on my google research) offers a couple methods of doing that.  I'll present how I accomplished this here.  The method I chose suited my home needs.  For a commercial or large scale project, something more efficient in the way of storage and speed of retrieval may be better.  Anyway, here goes.

I chose to store my photos as text representations of binary data in Postgresql database table columns with the text data type.  This decision was mainly based on my level of expertise and the fact that I am doing this for home use as part of a learning experience.  Storing the binary data as text inflates their size by a factor of two - very inefficient for storage.  For home use in a small database like mine, storage is hardly an issue.  At work I transfer a lot of binary data (3 dimensional mesh mined solids) to a distant server in text format using MSSQL's bcp.  Postgresql is a little different, but I am familiar with the general idea of stuffing a lot of text in a database column.

In order to get the data into comma delimited rows without dealing with a long, unwieldy string of text from the photos, I wrote a Python script to do it:

#!python3.4

"""
Prepare multiple rows of data
that includes a hexlify'd
picture for a column in
a table in the model train
database.
"""

import binascii
import os

UTF8 = 'utf-8'
# LATIN1 = 'latin-1'

INFOFILE = 'infoiii.csv'

PICTUREFILEFMT = '{:s}.{:s}'
ROWFILEOUTFMT = '{:s}row'

JPG = 'jpg'
PNG = 'png'

COMMA = ','

PATHX = '/home/carl/postgresinstall/workpictures/multiplecars/'

PATHXOUT = PATHX + 'rows/'

PHOTOMSG = 'Now doing photo {:s} . . .'

def checkfileextension(basename):
    """
    With the basename of an image file
    returns True for jpg and false for
    anything else (png).
    """
    if os.path.exists(PATHX +

        PICTUREFILEFMT.format(basename, JPG)):
        return True
    else:
        return False

with open(PATHX + INFOFILE, 'r', encoding=UTF8) as finfo:
    for linex in finfo:
        newlineparts = [x.strip() for x in linex.split(COMMA)]
        photox = newlineparts.pop()
        print(PHOTOMSG.format(photox))
        # Check for jpg or png here
        # XXX - this could be better - could actually
        #       check and return actual extension;
        #       more code and lazy.
        extension = ''
        if checkfileextension(photox):
            extension = JPG
        else:
            extension = PNG
        with open(PATHX +

            PICTUREFILEFMT.format(photox,
                extension), 'rb') as fphoto:
            contents = binascii.hexlify(fphoto.read())
            liney = COMMA.join(newlineparts)
            liney += COMMA
            liney = bytes(liney, UTF8)
            liney += contents
            with open(PATHXOUT +

                ROWFILEOUTFMT.format(photox), 'wb') as frow:
                frow.write(liney)

print('\nDone\n')


The basic gist of the script is to get each photo name provided into a file that can be later imported into a table in Postgresql.  The paths in the capitalized "constants" would have to be adjusted for your situation (I tend to go overboard on capitalized constants because I'm a lousy typist and want to avoid screwing up and then having to debug my typos).  The INFOFILE referred to in the script has roughly the following format:

<column1data>, <column2data>, . . . , <photofilename>

So the idea is to take a comma delimited file, encode it in UTF-8, and stuff the binary data from the (correct) photo at the end as text.  I designed my database tables with photos (I use the column name "picture") with the text data column as the last - this is kind of a hack, but it made scripting this easier.

An example of importing one of these "row" files into the database table from within psql:

$ psql hotrains carl
Password for user carl:
psql (9.4.1)
Type "help" for help.

hotrains=# \d
                List of relations
 Schema |          Name          | Type  | Owner
--------+------------------------+-------+-------
 public | rollingstockprototypes | table | carl
(1 row)

hotrains=# \d rollingstockprototypes
     Table "public.rollingstockprototypes"
  Column  |          Type          | Modifiers
----------+------------------------+-----------
 namex    | character varying(50)  | not null
 photourl | character varying(150) | not null
 comments | text                   | not null
 picture  | text                   | not null
Indexes:
    "rsprotoname" PRIMARY KEY, btree (namex)

hotrains=# COPY rollingstockprototypes FROM '/home/carl/postgresinstall/G39Arow' (DELIMITER ',');

COPY 1


My Python script for actually displaying a photo or image is a little hacky in that in requires checks for the size of the output versus the size of the information pulled from the Postgresql database table.  My original script would show the picture piped to the lightweight UNIX image viewer feh as partially complete.  In order to get around this I put a timed loop in the script to check that the image data were about half of the size of the text data pulled.  It works well enough, if slowly at times:

#!/usr/local/bin/python3.4

"""
Try to mimic processing of image
coming out of postgresql query
as a stream.
"""

import binascii
import os
import time
import sys

import argparse

# Name of file containing psql \copy hex output (text).
HEXFILE = '/home/carl/postgresinstall/workpictures/hexoutput'

# 2.5 seconds max delay before abort.
# Enough time to write most big pixel
# jpg's, it appears.
MAXTIME = 2.5
PAUSEX = 0.25

# Argument name.
PICTURENAME = 'picturename'

parser = argparse.ArgumentParser()
parser.add_argument(PICTURENAME)
args = parser.parse_args()
print(args.picturename)

# Name of picture file
# written from hex query.
PICNAME = args.picturename

# Extensions feh recognizes.
PNG = 'png'
JPG = 'jpg'

FILEEXTENSIONMSG = '\nFile extension {:s} detected.\n'
UNRECOGNFILENAME = '\nUnrecognized file extension for picture '

UNRECOGNFILENAME += '{:s}\n'
ABORTMSG = '\nSorry, no data available for feh.  Aborting.\n'

SLEEPMSG = '\nSleeping {:2.2f} seconds . . .\n'

SIZEHEXFILEMSG = '\nsize of hex output = {:d}\n'
SIZEBINARYMSG = '\nsize of binary file = {:d}\n'
SIZERATIOMSG = '\nsize of hex output/size of binary file '

SIZERATIOMSG += '{:05.5f}\n'

ACCEPTABLEHEXTOBINRATIO = 1.99
ABORTMSGTOOSMALL = '\nSorry, not enough data to show a '

ABORTINGTOOSMALL += 'complete picture.  Aborting.\n'

extension = PICNAME[-3:]
if extension == PNG:
    print(FILEEXTENSIONMSG.format(PNG))
elif extension == JPG:
    print(FILEEXTENSIONMSG.format(JPG))
else:
    print(UNRECOGNFILENAME.format(extension))
    print(ABORTMSG)
    sys.exit()

PICFILEFMT = '/home/carl/postgresinstall/workpictures/{:s}'
FEHFMT = 'feh -g 400x300+200+200 {:s}'

# Length of binary string.
lenx = 0
# 2 variables track changes in size of
# hex output from query in psql.
sizex = 0
sizexnew = 0
# Tracks time spent sleeping.
totaltimewait = 0.0

while totaltimewait < MAXTIME:
    # Try to make sure hex file is completely written.
    sizexnew = os.path.getsize(HEXFILE)
    if sizexnew > sizex or sizexnew == 0:
        sizex = sizexnew
        print(SLEEPMSG.format(PAUSEX))
        time.sleep(PAUSEX)
        totaltimewait += PAUSEX
    elif sizexnew == sizex:
        with open(HEXFILE, 'rb') as f2:
            with open(PICFILEFMT.format(PICNAME), 'wb') as f:
                strx = binascii.unhexlify(f2.read().strip())
                lenx = len(strx)
                print(SIZEHEXFILEMSG.format(sizexnew))
                print(SIZEBINARYMSG.format(lenx))
                print(SIZERATIOMSG.format(sizexnew/lenx))
                f.write(strx)
        break

# I don't want part of a picture.
if not (sizexnew > 0 and
        sizexnew/lenx > ACCEPTABLEHEXTOBINRATIO):
    print(ABORTMSGTOOSMALL)
    sys.exit()

# Pops up picture on screen.
os.system(FEHFMT.format(PICFILEFMT.format(PICNAME)))

print('\nDone\n')


Let's see if we can get a look at this in action - example of call from within psql:

hotrains=# \copy (SELECT decode(picture, 'hex') FROM rollingstockprototypes WHERE namex = 'G-39A Ore Jenny') to program 'cat > /home/carl/postgresinstall/workpictures/hexoutput | imageshowiii.py'
COPY 1
hotrains=#


And a screenshot of a (hopefully acceptable) result:


Depending on which directory I've logged into psql under, I may have to type the full paths of the output and Python file.

There is more I could do with this, but for now I'm OK with it.  Writing to a file and then checking on its size is slow.  There is probably a way to write to memory and check what's there, but I got stuck on that and decided to go with the less efficient solution.

Thanks for stopping by.

1 comment:

  1. Consider base64 (https://docs.python.org/3/library/base64.html) instead of binascii.hexlify(), that way the overhead is just 33% instead of 100%.

    ReplyDelete