Saturday, October 17, 2015

Setting Up Toy Postgresql Database on OpenBSD

This isn't a Python scripting post, but the next one will be on the same topic.  In this post I get a Postgresql database set up on my OpenBSD laptop and get familiar with the Postgresql environment.

I primarily use Microsoft SQL Server and vendor supplied database schemas at work.  I know Postgresql has a good reputation among open source databases, but I haven't had an opportunity to use it in a work environment (I had a brief brush with Jigsaw years back - a competitor to Modular's MSSQL-based Powerview (Dispatch) in pit mining truck tracking database - but that doesn't count.)

Anyway, as I've noted in previous posts, I run OpenBSD as my operating system on my laptop at home.  The OpenBSD project has a package for Postgresql.

The first order of business is to install the Postgresql server package.  First, I'll set up a PKG_PATH  FTP mirror location from within the ksh shell:

$ export PKG_PATH=ftp://ftp3.usa.openbsd.org/pub/OpenBSD/5.7/packages/i386/

That ftp3.usa.openbsd.org server is the one in Boulder, Colorado - that's the one I usually use.  I'm in Tucson, Arizona in the Mountain timezone, so it kind of makes sense to use that one.  My understanding is that, in general, you want to use a mirror away from the main one to spread out the bandwidth and server use for the OpenBSD (or any other open source) project.

Now to install the package - this has to be done as root.  I use sudo for this (sudo's replacement, as I understand it, in OpenBSD 5.8 will be doas(1) although you'll still be able to get sudo(1) as a package).

$ echo $PKG_PATH 
 ftp://ftp3.usa.openbsd.org/pub/OpenBSD/5.7/packages/i386/

$ sudo pkg_add postgresql-server
quirks-2.54 signed on 2015-03-09T11:04:08Z
No change in quirks-2.54
postgresql-server-9.4.1p1 (extracting)
1%
1%
2%
3% ********


<etc.>

100%
postgresql-server-9.4.1p1 (installing)
0% useradd: Warning: home directory `/var/postgresql' doesn't exist, and -m was not specified
postgresql-server-9.4.1p1 (installing)|
1%
1%
2%
3% ********


<etc.> 

100%

postgresql-server-9.4.1p1: ok
The following new rcscripts were installed: /etc/rc.d/postgresql
See rcctl(8) for details.
Look in /usr/local/share/doc/pkg-readmes for extra documentation.
$


Given an internet connection with decent speed, this all goes pretty quickly.  The first set of per cent numbers are the download of the gzippped tar package binary, the second are the unzipping and install of the Postgresql binaries in the proper location in the operating system file hierarchy.

For years I had some trouble getting my head around setting up users for Postgresql and running the daemon.  Much of my database experience is as an application user at work using Microsoft SQL Server.  We use Windows Authentication there primarily.  Working on my own UNIX-based (OpenBSD) home system is a little different.

Most of the problems I've had overcoming this user/security hump related to my lack of a good strong grasp of UNIX users and permissions (like I could do it in my sleep strong grasp).  OpenBSD is a bit unique in that it has a special name for the postgresql unprivileged user:  _postgresql.  The underscore is a convention in OpenBSD for this general class of user, usually associated with a daemon that runs on startup or gets started by root, doesn't have a login (nor a password).  Michael Lucas spends several pages with a good summary of the rational behind this, the history and its conventions in his authoritative OpenBSD book.




So, we want to take a look at the directory designated for Postgresql's data, /var/postgresql:

$ ls -lah /var | grep post


drwxr-xr-x   2 _postgresql  _postgresql   512B May 19 17:52 postgresql

$ cd postgresql

There is no data directory there (just . and .. in the /var/postgresql directory - the 2 in the ls output).  This is typically where I would get stuck in the past.  I ended up doing it manually . . . and wrong, or at least in a way that was more difficult than necessary.  Anyway, I recorded it that way, so I'll blog it as executed.

What I had difficulty understanding before was the whole unprivileged user concept.  Basically you need to use su to log on as root, then further su to log on as _postgresql:

# THIS IS AN UNNECESSARY STEP - DON'T DO THIS

$ su
Password:
# su - _postgresql
$ mkdir /var/postgresql/data
$ ls -lah /var/postgresql
total 12
drwxr-xr-x   3 _postgresql  _postgresql   512B Jun  4 19:06 .
drwxr-xr-x  23 root         wheel         512B May 19 17:52 ..
drwxr-xr-x   2 _postgresql  _postgresql   512B Jun  4 19:06 data
$ exit
# exit
$


# END UNNECESSARY STEP

Now I need a database cluster.  I want to initialize it with support for UTF-8 because I have some text data with umlauts in it (non-ASCII):

$ su
Password:
# su - _postgresql
$ initdb -D /var/postgresql/data -U postgres -A md5 -E UTF8 -W


The files belonging to this database system will be owned by user "_postgresql".
This user must also own the server process.

The database cluster will be initialized with locale "C".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 30
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in /var/postgresql/data/base/1 ... ok
initializing pg_authid ... ok
Enter new superuser password:
Enter it again:
setting password ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... not supported on this platform
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    postgres -D /var/postgresql/data
or
    pg_ctl -D /var/postgresql/data -l logfile start

$ exit
# exit
$ whoami
carl
$ pwd
/home/carl


A couple things:

1) There's a line in the output about fixing permissions on the existing data directory (this will show up as highlighted on the blog, possibly not in the planetpython blog feed) - had I done this correctly (just let initdb make the directory itself), that line would look something like this (I created another cluster while writing the blog just so I would understand how to do it right):

creating directory /var/postgresql/data4 ... ok


Right there in the initdb(1) man page:  "Creating a database cluster consists of creating the directories in which the database date will live . . ."  The man page goes on to explain how to get around permission problems, etc. in this process.  Note to self:  read the man page . . . carefully.

2) What I also learned is that you can make as many database clusters as you want, all with different data directories.  postgres is the superuser name you see in the documentation and /var/postgresql/data is the directory, but, as demonstrated above in the output, you could put your data in a folder called data4.  If you gave a different name at the -U switch in the initdb command, the superuser name would be different too.  Or you could have more than one cluster with postgres named superusers but with different passwords.

All that said, one cluster per physical box and the conventional names are plenty for me - I'm just trying to get used to the Postgresql environment and get started.

At this point I need to start up the Postgresql daemon.  In the package install above, the output mentions an rc script /etc/rc.d/postgresql.  This is run by root - below is a demo of using it manually with su (instead of using it as part of an rc startup sequence at boot):

$ su
Password:

# /etc/rc.d/postgresql start
postgresql(ok)
# pgrep postgres
6960
10175
4748
29053
32758
26201
# /etc/rc.d/postgresql stop                                                   
postgresql(ok)
# pgrep postgres


All I did there was start the Postgresql daemon with the installed rc script, check to see that it's associated processes are running, then stop the daemon with the same script.

Me being me, I can't leave good enough alone.  I want the control of starting and stopping the daemon when I decide to (I am running this on a laptop).  As I understand it, pg_ctl is a wrapper program provided with the Postgresql install for even more low level commands and functionality.  I use pg_ctl to run the daemon and start it with the _postgresql user account:

$ su
Password:
 

# su - _postgresql
$ pg_ctl -D /var/postgresql/data -l firstlog start
server starting
$ exit
# exit
$


I asked pg_ctl to make a specific log file for this session (firstlog - this will go in directory /var/postgresql/).  The logs are human readable and I wanted to study them later to see what's going on (there's all kinds of stuff in there about autovacuum and what not - sorry, we're not covering that in this blog post - but I'll have it available later).

Shutting down (stopping) the daemon is pretty simple with pg_ctl - just a few more keystrokes than if I had done it from root with the rc script:

$ su
Password:
# su - _postgresql
$ pg_ctl -D /var/postgresql/data stop
waiting for server to shut down.... done
server stopped
$ exit
# exit
$ whoami
carl
$
 

Great - so I'm good for getting the daemon going when I want it and for designating my own specific log files per session.  Now to create a user and get to work:

(with daemon running):

$ psql -U postgres
Password for user postgres:
psql (9.4.1)
Type "help" for help.

postgres=# CREATE ROLE carl SUPERUSER;
CREATE ROLE
postgres=# ALTER ROLE carl PASSWORD 'xxxxxxxx'
ALTER ROLE
postgres=# ALTER USER carl PASSWORD 'xxxxxxxx' LOGIN;
ALTER ROLE
postgres=# \q

$

I created a user/role carl with SUPERUSER capabilities within this instance of Postgresql.  It's a bit ugly and I'm not sure I've done this correctly or the easiest way.  Also, and of importance, I have given Postgresql user carl (not OpenBSD user carl) all permissions on everything.  Really, carl only needs permissions to work on the database he's working on.  Josh Drake (@linuxhiker on twitter) pointed this out to me.  I am grateful for this.  He is right.  I am lazy.

Now to create my database.  I got into model trains around Christmas of 2015 and went crazy collecting stuff and setting up a layout.  I needed to somehow keep track of all the cars before it all got too unwieldy.

$ psql postgres carl
Password for user carl:
psql (9.4.1)
Type "help" for help.

postgres=# CREATE DATABASE hotrains;
CREATE DATABASE
postgres=# \q



The command line entry to start psql is something I'm a bit fuzzy on - postgres isn't, to the best of my knowledge, a database per se, but a means of connecting to psql when you don't want to designate a default database ot work on.

I'm not going to post the full database code for the sake of brevity - it's only 11 tables but that's a bit much for a blog post.  Instead I'll post a graphic schema I made and talk to it a little bit before posting one related SQL code snippet.

Disclaimer:  I'm not a designer.  This schema diagram I did with Dia, a fairly lightweight Linux/UNIX desktop tool for flowcharts and stuff.  I've never met a color palette or font choice I could simply let be.  Asking me to do a flowchart with a lot of leeway on design is like leaving a two year old home alone with a Crayola 64 pack of crayons and the 300 year old family Bible - it can't end well.

All that said, I find schema diagrams helpful for showing relationships between tables and having an ugly one is better than none at all.  I've embedded an svg version of it below; hopefully it shows up on the planetpython feed:




The focus of my crude toy database design was the use of foreign keys to maintain consistency in naming things I want to track:  rail name for example.  I went with "Santa Fe" where I could have went with (and probably should have) "ATSF."  It doesn't matter as long as it's consistent and I know what it means.

Years ago I was called in to do some work on a blasting database at the mine.  There weren't any constraints on the entry of names of blasting materials, but what could go wrong?  There were only three or four products with four digit designators and "None."  Well . . . it was a mess.  I didn't want to take any chances on having a situation like that again, even, or especially, if I was doing all of the data entry.  Foreign keys it was!

Here's a quick dump of the code I used to create the validsidenumbers table.  The idea is to make sure the rail line or company name is consistent in all the side number records (yes, I did actually purchase some identical rolling stock with the exact same side numbers - it's a long story):

hotrains=# CREATE TABLE validsidenumbers (
railnamex        varchar(50) REFERENCES validraillines (namex),
sidenumber       varchar(50),
comments         text,
PRIMARY KEY (railnamex, sidenumber)
);
CREATE TABLE
hotrains=#


That REFERENCES keyword sees to it that I won't enter anything typo'd or goofy into that railnamex column.

Next post is a Python one about pulling storing images of the train cars in the database and displaying them from within psql.

Thanks for stopping by.

8 comments:

  1. I have a problem that the OpenBSD install locale is set to "C" and when I run the same initdb command as you did here, I also see the message, 'The database cluster will be initialized with locale "C"' - and the encoding is in SQL_ASCII. I want it to be UTF-8 too, but it won't play ball. I wondered if you hit this snag too - as you have the same message about encoding above.

    ReplyDelete
  2. I'm a bit out of my depth here, but I did do a google search:

    https://www.postgresql.org/docs/current/static/creating-cluster.html

    My understanding from looking at the doc is that if you use a locale other than "C" (DE for German, for instance) it will impact the performance and ability of Postgresql to perform regex and other text searches and sorting efficiently. This is related to collation.

    Now I'm really out of my depth, but my understanding is that "locale" and "encoding" are two different animals. I'm pretty sure initdb took that -E switch properly. I don't see any messages contradicting the UTF-8 thing and the umlauts I have stored in my database have worked out OK as far as I can tell.

    It could be I'm missing something. Sorry I wasn't more help.

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete