Sunday, December 6, 2009

Testing - baby steps (unittest)

A couple days ago I posted some code that dealt with reading bytes from a file and manually interpreting them (as opposed to using the Python3.x bytes type's decode() method).

It occurred to me afterwards that I really should have written some kind of test for the code, even if it was a fairly crude snippet.  I've heard everything from "Not testing is evil" within the Python community to "Look, that's developer stuff; you're not a developer; you're a geologist . . .", etc. in the workplace.  My personal thought is that if I'm writing code, I should be testing it no matter what anyone says.

The code below is my attempt to test what I wrote the other day - it did reveal a couple errors and omissions, so it ended up being a good use of time.  I've probably abused unittest's setUp() method a bit (making sure I was at the start of my test file).  Otherwise, I hope it's an acceptable start.

To the code!


import unittest
import handlebytes



class TestBytes(unittest.TestCase):
    def setUp(self):
        self.filenamex = '/usr/home/carl/pythonblog/foreignbytestest'
        self.fbx = handlebytes.FileByter(self.filenamex)
    def testreadchar(self):
        self.setUp()
        self.fbx.readchar()
        self.assertEqual(self.fbx.currentcharord, 

                         0x65e5)
        self.assertEqual(self.fbx.charbytes, 

                         b'\xe6\x97\xa5')
    def testgimmebyte(self):
        self.setUp()
        self.fbx.gimmebyte()
        self.assertEqual(self.fbx.currentbyte, 

                         b'\xe6')
    def testinterpfirstbyte(self):
        # one byte ASCII
        self.assertEqual(handlebytes.interpfirstbyte(b'\x7f'),
                         (1, 0x7f))
        # forbidden zone between ASCII and UTF-8 first byte
        self.assertEqual(

        handlebytes.interpfirstbyte(b'\xbf'),
                         handlebytes.ERRORX)
        # two bytes
        self.assertEqual(handlebytes.interpfirstbyte(b'\xd7'),
                         (2, 0x17))
        # three bytes
        self.assertEqual(handlebytes.interpfirstbyte(b'\xeb'),
                         (3, 0xb))
        # four bytes
        self.assertEqual(handlebytes.interpfirstbyte(

                          b'\xf4'), (4, 0x4))
        # five bytes
        self.assertEqual(handlebytes.interpfirstbyte(

                          b'\xf9'), (5, 0x1))
        # six bytes
        self.assertEqual(handlebytes.interpfirstbyte(

                          b'\xfd'), (6, 0x1))
        # beyond range
        self.assertEqual(handlebytes.interpfirstbyte(

                         b'\xfe'), handlebytes.ERRORX)
    def testinterpsubsqntbyte(self):
        self.assertEqual(handlebytes.interpsubsqntbyte(

                         b'\x9b'), 0x1b)
        # beyond range
        self.assertEqual(handlebytes.interpsubsqntbyte(

                         b'\xc0'), handlebytes.ERRORY)

if __name__ == '__main__':
    unittest.main()


And to the command line:

$ python3.1 unittest_handlebytes.py
.Not a valid first byte of a UTF-8 character sequence.
Not a valid first byte of a UTF-8 character sequence.
.Not a valid byte for a UTF-8 multibyte sequence.
..
----------------------------------------------------------------------
Ran 4 tests in 0.001s

OK
$


Ahhh . . . nirvana!

Update 7DEC09:  I took the advice of one of the commentors and clarified the setUp method use.  In addition I added a tearDown method and made separate tests for each initial UTF-8 byte.

Because a file is being read sequentially, it seemed safest to have two separate classes for the tests that read the file.


import unittest

import handlebytes



class TestChar(unittest.TestCase):

    def setUp(self):

        self.filenamex = '/usr/home/carl/pythonblog/foreignbytestest'

        self.fbx = handlebytes.FileByter(self.filenamex)

    def testreadchar(self):

        self.fbx.readchar()

        self.assertEqual(self.fbx.currentcharord, 0x65e5)

        self.assertEqual(self.fbx.charbytes, b'\xe6\x97\xa5')

    def tearDown(self):

        while 1:

            try:

                self.fbx.gimmebyte()

            except ValueError:

                break



class TestByte(unittest.TestCase):

    def setUp(self):

        self.filenamex = '/usr/home/carl/pythonblog/foreignbytestest'

        self.fbx = handlebytes.FileByter(self.filenamex)

    def testgimmebyte(self):

        self.fbx.gimmebyte()

        self.assertEqual(self.fbx.currentbyte, b'\xe6')

    def tearDown(self):

        while 1:

            try:

                self.fbx.gimmebyte()

            except ValueError:

                break



class TestFirstByte(unittest.TestCase):

    def testascii(self):

        # one byte ASCII

        self.assertEqual(handlebytes.interpfirstbyte(b'\x7f'),

                         (1, 0x7f))

    def testbadascii(self):

        # forbidden zone between ASCII and UTF-8 first byte

        self.assertEqual(handlebytes.interpfirstbyte(b'\xbf'),

                         handlebytes.ERRORX)

    def testtwobytes(self):

        # two bytes

        self.assertEqual(handlebytes.interpfirstbyte(b'\xd7'),

                         (2, 0x17))

    def testthreebytes(self):

        # three bytes

        self.assertEqual(handlebytes.interpfirstbyte(b'\xeb'),

                         (3, 0xb))

    def testfourbytes(self):

        # four bytes

        self.assertEqual(handlebytes.interpfirstbyte(b'\xf4'),

                         (4, 0x4))

    def testfivebytes(self):

        # five bytes

        self.assertEqual(handlebytes.interpfirstbyte(b'\xf9'),

                         (5, 0x1))

    def testsixbytes(self):

        # six bytes

        self.assertEqual(handlebytes.interpfirstbyte(b'\xfd'),

                         (6, 0x1))

    def testbeyondrange(self):

        # beyond range

        self.assertEqual(handlebytes.interpfirstbyte(b'\xfe'),

                         handlebytes.ERRORX)



class TestSubsqntByte(unittest.TestCase):

    def testgoodsubqbyte(self):

        self.assertEqual(handlebytes.interpsubsqntbyte(b'\x9b'), 0x1b)

    def testbadsubqbyte(self):

        # beyond range

        self.assertEqual(handlebytes.interpsubsqntbyte(b'\xc0'),

                         handlebytes.ERRORY)



if __name__ == '__main__':

    unittest.main()

The new run yields this output: 

$ python3.1 unittest_handlebytesii.py

Closed file /usr/home/carl/pythonblog/foreignbytestest.

.Closed file /usr/home/carl/pythonblog/foreignbytestest.

..Not a valid first byte of a UTF-8 character sequence.

.Not a valid first byte of a UTF-8 character sequence.

......Not a valid byte for a UTF-8 multibyte sequence.

..

----------------------------------------------------------------------

Ran 12 tests in 0.002s



OK

$

3 comments:

  1. Why not to separate testinterpfirstbyte and testinterpsubsqntbyte into classes of their own? It looks like it might make sense to split the commented parts as methods (unless they somehow depend on each other though I doubt that). Of course this means you have to define a superclass containing your setup method but that shouldn't be a problem.

    Also as you mentioned the usage of setUp method at the beginning of testreadchar and testgimmebyte looks suspicious. If there's some special reason for this, you should comment it in the code itself. Otherwise some poor maintenance coder would just get rid of those lines and have a nice WTF moment. :)

    ReplyDelete
  2. I've just started working with unittest as well, and it's been fun. You might want to double check the documentation, but I'm pretty sure setUp is run automatically before every other test function in the class.

    ReplyDelete
  3. @Juho - Thanks for the feedback and direction. I tried to implement your suggestions in an update.

    @Pedahzur - Thanks for the tip on setUp. I'm just not confident enough to assume anything right now. The module doc wasn't explicit (that I could tell) on this, so I refrained from assuming.

    ReplyDelete