Monday, February 28, 2011

jython + java.lang.Character.UnicodeBlock

In my post on Unicode Blocks in regular expressions, I mentioned there wasn't support for Unicode Blocks in CPython regular expressions.  Unicode Blocks are contiguous (by number) sections of the Unicode tables with some commonality among the characters.  There is another Java language feature that can be helpful for those interested in Unicode Blocks:  the java.lang.Character.UnicodeBlock object.  In jython:

Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54)
[Java HotSpot(TM) 64-Bit Server VM (Sun Microsystems Inc.)] on java1.6.0_22
Type "help", "copyright", "credits" or "license" for more information.
>>> from java.lang.Character import UnicodeBlock
>>> UnicodeBlock.of(236)
LATIN_1_SUPPLEMENT
>>> UnicodeBlock.of('a')
BASIC_LATIN
>>> UnicodeBlock.of(738)
SPACING_MODIFIER_LETTERS
>>> UnicodeBlock.of(922)
GREEK

>>> UnicodeBlock.of(0xffee)
HALFWIDTH_AND_FULLWIDTH_FORMS


The "of" method can accept either a character or a Unicode numeric identifier as it's argument.  It provides a shorthand method of finding out roughly where a character is in Unicode and what it might represent.

No comments:

Post a Comment