It occurred to me afterwards that I really should have written some kind of test for the code, even if it was a fairly crude snippet. I've heard everything from "Not testing is evil" within the Python community to "Look, that's developer stuff; you're not a developer; you're a geologist . . .", etc. in the workplace. My personal thought is that if I'm writing code, I should be testing it no matter what anyone says.
The code below is my attempt to test what I wrote the other day - it did reveal a couple errors and omissions, so it ended up being a good use of time. I've probably abused unittest's setUp() method a bit (making sure I was at the start of my test file). Otherwise, I hope it's an acceptable start.
To the code!
import handlebytes
class TestBytes(unittest.TestCase):
def setUp(self):
self.filenamex = '/usr/home/carl/pythonblog/foreignbytestest'
self.fbx = handlebytes.FileByter(self.filenamex)
def testreadchar(self):
self.setUp()
self.fbx.readchar()
self.assertEqual(self.fbx.currentcharord,
0x65e5)
self.assertEqual(self.fbx.charbytes,
b'\xe6\x97\xa5')
def testgimmebyte(self):
self.setUp()
self.fbx.gimmebyte()
self.assertEqual(self.fbx.currentbyte,
b'\xe6')
def testinterpfirstbyte(self):
# one byte ASCII
self.assertEqual(handlebytes.interpfirstbyte(b'\x7f'),
(1, 0x7f))
# forbidden zone between ASCII and UTF-8 first byte
self.assertEqual(
handlebytes.interpfirstbyte(b'\xbf'),
handlebytes.ERRORX)
# two bytes
self.assertEqual(handlebytes.interpfirstbyte(b'\xd7'),
(2, 0x17))
# three bytes
self.assertEqual(handlebytes.interpfirstbyte(b'\xeb'),
(3, 0xb))
# four bytes
self.assertEqual(handlebytes.interpfirstbyte(
b'\xf4'), (4, 0x4))
# five bytes
self.assertEqual(handlebytes.interpfirstbyte(
b'\xf9'), (5, 0x1))
# six bytes
self.assertEqual(handlebytes.interpfirstbyte(
b'\xfd'), (6, 0x1))
# beyond range
self.assertEqual(handlebytes.interpfirstbyte(
b'\xfe'), handlebytes.ERRORX)
def testinterpsubsqntbyte(self):
self.assertEqual(handlebytes.interpsubsqntbyte(
b'\x9b'), 0x1b)
# beyond range
self.assertEqual(handlebytes.interpsubsqntbyte(
b'\xc0'), handlebytes.ERRORY)
if __name__ == '__main__':
unittest.main()
And to the command line:
$ python3.1 unittest_handlebytes.py
.Not a valid first byte of a UTF-8 character sequence.
Not a valid first byte of a UTF-8 character sequence.
.Not a valid byte for a UTF-8 multibyte sequence.
..
----------------------------------------------------------------------
Ran 4 tests in 0.001s
OK
$
Ahhh . . . nirvana!
Update 7DEC09: I took the advice of one of the commentors and clarified the setUp method use. In addition I added a tearDown method and made separate tests for each initial UTF-8 byte.
Because a file is being read sequentially, it seemed safest to have two separate classes for the tests that read the file.
import unittest
import handlebytes
class TestChar(unittest.TestCase):
def setUp(self):
self.filenamex = '/usr/home/carl/pythonblog/foreignbytestest'
self.fbx = handlebytes.FileByter(self.filenamex)
def testreadchar(self):
self.fbx.readchar()
self.assertEqual(self.fbx.currentcharord, 0x65e5)
self.assertEqual(self.fbx.charbytes, b'\xe6\x97\xa5')
def tearDown(self):
while 1:
try:
self.fbx.gimmebyte()
except ValueError:
break
class TestByte(unittest.TestCase):
def setUp(self):
self.filenamex = '/usr/home/carl/pythonblog/foreignbytestest'
self.fbx = handlebytes.FileByter(self.filenamex)
def testgimmebyte(self):
self.fbx.gimmebyte()
self.assertEqual(self.fbx.currentbyte, b'\xe6')
def tearDown(self):
while 1:
try:
self.fbx.gimmebyte()
except ValueError:
break
class TestFirstByte(unittest.TestCase):
def testascii(self):
# one byte ASCII
self.assertEqual(handlebytes.interpfirstbyte(b'\x7f'),
(1, 0x7f))
def testbadascii(self):
# forbidden zone between ASCII and UTF-8 first byte
self.assertEqual(handlebytes.interpfirstbyte(b'\xbf'),
handlebytes.ERRORX)
def testtwobytes(self):
# two bytes
self.assertEqual(handlebytes.interpfirstbyte(b'\xd7'),
(2, 0x17))
def testthreebytes(self):
# three bytes
self.assertEqual(handlebytes.interpfirstbyte(b'\xeb'),
(3, 0xb))
def testfourbytes(self):
# four bytes
self.assertEqual(handlebytes.interpfirstbyte(b'\xf4'),
(4, 0x4))
def testfivebytes(self):
# five bytes
self.assertEqual(handlebytes.interpfirstbyte(b'\xf9'),
(5, 0x1))
def testsixbytes(self):
# six bytes
self.assertEqual(handlebytes.interpfirstbyte(b'\xfd'),
(6, 0x1))
def testbeyondrange(self):
# beyond range
self.assertEqual(handlebytes.interpfirstbyte(b'\xfe'),
handlebytes.ERRORX)
class TestSubsqntByte(unittest.TestCase):
def testgoodsubqbyte(self):
self.assertEqual(handlebytes.interpsubsqntbyte(b'\x9b'), 0x1b)
def testbadsubqbyte(self):
# beyond range
self.assertEqual(handlebytes.interpsubsqntbyte(b'\xc0'),
handlebytes.ERRORY)
if __name__ == '__main__':
unittest.main()
The new run yields this output:
Closed file /usr/home/carl/pythonblog/foreignbytestest.
.Closed file /usr/home/carl/pythonblog/foreignbytestest.
..Not a valid first byte of a UTF-8 character sequence.
.Not a valid first byte of a UTF-8 character sequence.
......Not a valid byte for a UTF-8 multibyte sequence.
..
----------------------------------------------------------------------
Ran 12 tests in 0.002s
OK
$
Why not to separate testinterpfirstbyte and testinterpsubsqntbyte into classes of their own? It looks like it might make sense to split the commented parts as methods (unless they somehow depend on each other though I doubt that). Of course this means you have to define a superclass containing your setup method but that shouldn't be a problem.
ReplyDeleteAlso as you mentioned the usage of setUp method at the beginning of testreadchar and testgimmebyte looks suspicious. If there's some special reason for this, you should comment it in the code itself. Otherwise some poor maintenance coder would just get rid of those lines and have a nice WTF moment. :)
I've just started working with unittest as well, and it's been fun. You might want to double check the documentation, but I'm pretty sure setUp is run automatically before every other test function in the class.
ReplyDelete@Juho - Thanks for the feedback and direction. I tried to implement your suggestions in an update.
ReplyDelete@Pedahzur - Thanks for the tip on setUp. I'm just not confident enough to assume anything right now. The module doc wasn't explicit (that I could tell) on this, so I refrained from assuming.